d3-dsv
此模块提供了一个用于分隔符分隔值的解析器和格式化器,最常见的是 逗号分隔值 (CSV) 或制表符分隔值 (TSV)。这些表格格式在 Microsoft Excel 等电子表格程序中很流行,并且通常比 JSON 更节省空间。此实现基于 RFC 4180。
¥This module provides a parser and formatter for delimiter-separated values, most commonly comma-separated values (CSV) or tab-separated values (TSV). These tabular formats are popular with spreadsheet programs such as Microsoft Excel, and are often more space-efficient than JSON. This implementation is based on RFC 4180.
例如,解析:
¥For example, to parse:
d3.csvParse("foo,bar\n1,2") // [{foo: "1", bar: "2"}, columns: ["foo", "bar"]]
d3.tsvParse("foo\tbar\n1\t2") // [{foo: "1", bar: "2"}, columns: ["foo", "bar"]]
格式化:
¥To format:
d3.csvFormat([{foo: "1", bar: "2"}]) // "foo,bar\n1,2"
d3.tsvFormat([{foo: "1", bar: "2"}]) // "foo\tbar\n1\t2"
要使用其他分隔符(例如“|”)表示竖线分隔值,请使用 d3.dsvFormat:
¥To use a different delimiter, such as “|” for pipe-separated values, use d3.dsvFormat:
d3.dsvFormat("|").parse("foo|bar\n1|2")) // [{foo: "1", bar: "2"}, columns: ["foo", "bar"]]
为了方便在浏览器中加载 DSV 文件,请参阅 d3-fetch 的 d3.csv、d3.tsv 和 d3.dsv 方法。
¥For easy loading of DSV files in a browser, see d3-fetch’s d3.csv, d3.tsv and d3.dsv methods.
dsvFormat(delimiter)
const csv = d3.dsvFormat(",");
源代码 · 为指定的分隔符构造一个新的 DSV 解析器和格式化程序。分隔符必须是单个字符(即单个 16 位代码单元);因此,ASCII 分隔符可以接受,但表情符号分隔符不行。
¥Source · Constructs a new DSV parser and formatter for the specified delimiter. The delimiter must be a single character (i.e., a single 16-bit code unit); so, ASCII delimiters are fine, but emoji delimiters are not.
dsv.parse(string, row) {#dsv_parse}
d3.csvParse("foo,bar\n1,2") // [{foo: "1", bar: "2"}, columns: ["foo", "bar"]]
源代码 · 解析指定的字符串,该字符串必须采用分隔符分隔值的格式,并使用适当的分隔符,返回一个表示已解析行的对象数组。
¥Source · Parses the specified string, which must be in the delimiter-separated values format with the appropriate delimiter, returning an array of objects representing the parsed rows.
Unlike dsv.parseRows, this method requires that the first line of the DSV content contains a delimiter-separated list of column names;这些列名将成为返回对象的属性。例如,考虑以下 CSV 文件:
¥Unlike dsv.parseRows, this method requires that the first line of the DSV content contains a delimiter-separated list of column names; these column names become the attributes on the returned objects. For example, consider the following CSV file:
Year,Make,Model,Length
1997,Ford,E350,2.34
2000,Mercury,Cougar,2.38
生成的 JavaScript 数组为:
¥The resulting JavaScript array is:
[
{"Year": "1997", "Make": "Ford", "Model": "E350", "Length": "2.34"},
{"Year": "2000", "Make": "Mercury", "Model": "Cougar", "Length": "2.38"}
]
返回的数组还公开了一个 columns
属性,该属性包含按输入顺序排列的列名(与 Object.keys 不同,后者的迭代顺序是任意的)。例如:
¥The returned array also exposes a columns
property containing the column names in input order (in contrast to Object.keys, whose iteration order is arbitrary). For example:
data.columns // ["Year", "Make", "Model", "Length"]
如果列名不唯一,则仅返回每个名称的最后一个值;要访问所有值,请改用 dsv.parseRows(参见 example)。
¥If the column names are not unique, only the last value is returned for each name; to access all values, use dsv.parseRows instead (see example).
如果未指定行转换函数,则字段值为字符串。出于安全考虑,不会自动转换为数字、日期或其他类型。在某些情况下,JavaScript 可能会自动将字符串强制转换为数字(例如,使用 +
运算符),但更好的方法是指定行转换函数。请参阅 d3.autoType,了解一个便捷的行转换函数,它可以推断和强制转换数字和字符串等常见类型。
¥If a row conversion function is not specified, field values are strings. For safety, there is no automatic conversion to numbers, dates, or other types. In some cases, JavaScript may coerce strings to numbers for you automatically (for example, using the +
operator), but better is to specify a row conversion function. See d3.autoType for a convenient row conversion function that infers and coerces common types like numbers and strings.
如果指定了行转换函数,则对每一行调用指定的函数,并传递一个表示当前行 (d
) 的对象、第一个非标题行从零开始的索引 (i
) 以及列名数组。如果返回值为 null 或未定义,则跳过该行,并从 dsv.parse 返回的数组中省略该行;否则,返回值定义相应的行对象。例如:
¥If a row conversion function is specified, the specified function is invoked for each row, being passed an object representing the current row (d
), the index (i
) starting at zero for the first non-header row, and the array of column names. If the returned value is null or undefined, the row is skipped and will be omitted from the array returned by dsv.parse; otherwise, the returned value defines the corresponding row object. For example:
const data = d3.csvParse(string, (d) => {
return {
year: new Date(+d.Year, 0, 1), // lowercase and convert "Year" to Date
make: d.Make, // lowercase
model: d.Model, // lowercase
length: +d.Length // lowercase and convert "Length" to number
};
});
注意:使用 +
或 Number
而不是 parseInt 或 parseFloat 通常速度更快,但限制性更强。例如,当使用 +
强制转换时,"30px"
返回 NaN
,而 parseInt 和 parseFloat 返回 30
。
¥Note: using +
or Number
rather than parseInt or parseFloat is typically faster, though more restrictive. For example, "30px"
when coerced using +
returns NaN
, while parseInt and parseFloat return 30
.
dsv.parseRows(string, row) {#dsv_parseRows}
d3.csvParseRows("foo,bar\n1,2") // [["foo", "bar"], ["1", "2"]]
源代码 · 解析指定的字符串(该字符串必须采用分隔符分隔值的格式,并使用适当的分隔符),返回一个表示已解析行的数组数组。
¥Source · Parses the specified string, which must be in the delimiter-separated values format with the appropriate delimiter, returning an array of arrays representing the parsed rows.
Unlike dsv.parse, this method treats the header line as a standard row, and should be used whenever DSV content does not contain a header.每行都表示为一个数组,而不是一个对象。行的长度可以可变。例如,考虑以下 CSV 文件,该文件明显缺少标题行:
¥Unlike dsv.parse, this method treats the header line as a standard row, and should be used whenever DSV content does not contain a header. Each row is represented as an array rather than an object. Rows may have variable length. For example, consider the following CSV file, which notably lacks a header line:
1997,Ford,E350,2.34
2000,Mercury,Cougar,2.38
生成的 JavaScript 数组为:
¥The resulting JavaScript array is:
[
["1997", "Ford", "E350", "2.34"],
["2000", "Mercury", "Cougar", "2.38"]
]
如果未指定行转换函数,则字段值为字符串。出于安全考虑,不会自动转换为数字、日期或其他类型。在某些情况下,JavaScript 可能会自动将字符串强制转换为数字(例如,使用 +
运算符),但更好的方法是指定行转换函数。请参阅 d3.autoType,了解一个便捷的行转换函数,它可以推断和强制转换数字和字符串等常见类型。
¥If a row conversion function is not specified, field values are strings. For safety, there is no automatic conversion to numbers, dates, or other types. In some cases, JavaScript may coerce strings to numbers for you automatically (for example, using the +
operator), but better is to specify a row conversion function. See d3.autoType for a convenient row conversion function that infers and coerces common types like numbers and strings.
如果指定了行转换函数,则对每一行调用指定的函数,并传递一个表示当前行 (d
) 的数组、第一行从零开始的索引 (i
) 以及列名数组。如果返回值为 null 或未定义,则跳过该行,并从 dsv.parse 返回的数组中省略该行;否则,返回值定义相应的行对象。例如:
¥If a row conversion function is specified, the specified function is invoked for each row, being passed an array representing the current row (d
), the index (i
) starting at zero for the first row, and the array of column names. If the returned value is null or undefined, the row is skipped and will be omitted from the array returned by dsv.parse; otherwise, the returned value defines the corresponding row object. For example:
const data = d3.csvParseRows(string, (d, i) => {
return {
year: new Date(+d[0], 0, 1), // convert first column to Date
make: d[1],
model: d[2],
length: +d[3] // convert fourth column to number
};
});
实际上,row 类似于将 map 和 filter 运算符应用于返回的行。
¥In effect, row is similar to applying a map and filter operator to the returned rows.
dsv.format(rows, columns) {#dsv_format}
d3.csvFormat([{foo: "1", bar: "2"}]) // "foo,bar\n1,2"
d3.csvFormat([{foo: "1", bar: "2"}], ["foo"]) // "foo\n1"
源代码 · 将指定的对象数组的行格式化为以分隔符分隔的值,并返回一个字符串。此操作与 dsv.parse 逆操作。每行将由换行符 (\n
) 分隔,每行中的每列将由分隔符(例如逗号 ,
)分隔。包含分隔符、双引号 ("
) 或换行符的值将使用双引号进行转义。
¥Source · Formats the specified array of object rows as delimiter-separated values, returning a string. This operation is the inverse of dsv.parse. Each row will be separated by a newline (\n
), and each column within each row will be separated by the delimiter (such as a comma, ,
). Values that contain either the delimiter, a double-quote ("
) or a newline will be escaped using double-quotes.
如果未指定 columns,则构成标题行的列名列表由 rows 中所有对象的所有属性的并集确定;列的顺序不确定。如果指定了 columns,则它是一个表示列名的字符串数组。例如:
¥If columns is not specified, the list of column names that forms the header row is determined by the union of all properties on all objects in rows; the order of columns is nondeterministic. If columns is specified, it is an array of strings representing the column names. For example:
const string = d3.csvFormat(data, ["year", "make", "model", "length"]);
每个行对象上的所有字段都将强制转换为字符串。如果字段值为 null 或未定义,则使用空字符串。如果字段值为日期,则使用 ECMAScript 日期时间字符串格式(ISO 8601 的子集):例如,UTC 午夜的日期格式为 YYYY-MM-DD
。为了更好地控制字段的格式以及如何格式化,请先将行映射到字符串数组,然后使用 dsv.formatRows。
¥All fields on each row object will be coerced to strings. If the field value is null or undefined, the empty string is used. If the field value is a Date, the ECMAScript date-time string format (a subset of ISO 8601) is used: for example, dates at UTC midnight are formatted as YYYY-MM-DD
. For more control over which and how fields are formatted, first map rows to an array of array of string, and then use dsv.formatRows.
dsv.formatBody(rows, columns) {#dsv_formatBody}
d3.csvFormatBody([{foo: "1", bar: "2"}]) // "1,2"
d3.csvFormatBody([{foo: "1", bar: "2"}], ["foo"]) // "1"
源代码 · 等同于 dsv.format,但省略了标题行。例如,在将行附加到现有文件时,这很有用。
¥Source · Equivalent to dsv.format, but omits the header row. This is useful, for example, when appending rows to an existing file.
dsv.formatRows(rows) {#dsv_formatRows}
d3.csvFormatRows([["foo", "bar"], ["1", "2"]]) // "foo,bar\n1,2"
源代码 · 将指定的字符串数组的行格式化为以分隔符分隔的值,并返回一个字符串。此操作与 dsv.parseRows 反向操作。每行将由换行符 (\n
) 分隔,每行中的每列将由分隔符(例如逗号 ,
)分隔。包含分隔符、双引号 (") 或换行符的值将使用双引号进行转义。
¥Source · Formats the specified array of array of string rows as delimiter-separated values, returning a string. This operation is the reverse of dsv.parseRows. Each row will be separated by a newline (\n
), and each column within each row will be separated by the delimiter (such as a comma, ,
). Values that contain either the delimiter, a double-quote (") or a newline will be escaped using double-quotes.
要将对象数组转换为数组数组并明确指定列,请使用 array.map。例如:
¥To convert an array of objects to an array of arrays while explicitly specifying the columns, use array.map. For example:
const string = d3.csvFormatRows(data.map((d, i) => {
return [
d.year.getUTCFullYear(), // Assuming d.year is a Date object.
d.make,
d.model,
d.length
];
}));
如果你愿意,还可以使用列名数组对此结果进行 array.concat 操作以生成第一行:
¥If you like, you can also array.concat this result with an array of column names to generate the first row:
const string = d3.csvFormatRows([[
"year",
"make",
"model",
"length"
]].concat(data.map((d, i) => {
return [
d.year.getUTCFullYear(), // Assuming d.year is a Date object.
d.make,
d.model,
d.length
];
})));
dsv.formatRow(row) {#dsv_formatRow}
d3.csvFormatRow(["foo", "bar"]) // "foo,bar"
源代码 · 将字符串数组的单个行格式化为以分隔符分隔的值,并返回字符串。行内的每一列都将由分隔符(例如逗号 ,
)分隔。包含分隔符、双引号 (") 或换行符的值将使用双引号进行转义。
¥Source · Formats a single array row of strings as delimiter-separated values, returning a string. Each column within the row will be separated by the delimiter (such as a comma, ,
). Values that contain either the delimiter, a double-quote (") or a newline will be escaped using double-quotes.
dsv.formatValue(value) {#dsv_formatValue}
d3.csvFormatValue("foo") // "foo"
源代码 · 将单个值或字符串格式化为以分隔符分隔的值,并返回字符串。包含分隔符、双引号 (") 或换行符的值将使用双引号进行转义。
¥Source · Format a single value or string as a delimiter-separated value, returning a string. A value that contains either the delimiter, a double-quote (") or a newline will be escaped using double-quotes.
csvParse(string, row)
¥Equivalent to d3.dsvFormat(",").parse
.
csvParseRows(string, row)
等同于 d3.dsvFormat(",").parseRows
。
¥Equivalent to d3.dsvFormat(",").parseRows
.
csvFormat(rows, columns)
¥Equivalent to d3.dsvFormat(",").format
.
csvFormatBody(rows, columns)
等同于 d3.dsvFormat(",").formatBody
。
¥Equivalent to d3.dsvFormat(",").formatBody
.
csvFormatRows(rows)
等同于 d3.dsvFormat(",").formatRows
。
¥Equivalent to d3.dsvFormat(",").formatRows
.
csvFormatRow(row)
等同于 d3.dsvFormat(",").formatRow
。
¥Equivalent to d3.dsvFormat(",").formatRow
.
csvFormatValue(value)
等同于 d3.dsvFormat(",").formatValue
。
¥Equivalent to d3.dsvFormat(",").formatValue
.
tsvParse(string, row)
¥Equivalent to d3.dsvFormat("\t").parse
.
tsvParseRows(string, row)
等同于 d3.dsvFormat("\t").parseRows
。
¥Equivalent to d3.dsvFormat("\t").parseRows
.
tsvFormat(rows, columns)
等同于 d3.dsvFormat("\t").format
。
¥Equivalent to d3.dsvFormat("\t").format
.
tsvFormatBody(rows, columns)
等同于 d3.dsvFormat("\t").formatBody
。
¥Equivalent to d3.dsvFormat("\t").formatBody
.
tsvFormatRows(rows)
等同于 d3.dsvFormat("\t").formatRows
。
¥Equivalent to d3.dsvFormat("\t").formatRows
.
tsvFormatRow(row)
等同于 d3.dsvFormat("\t").formatRow
。
¥Equivalent to d3.dsvFormat("\t").formatRow
.
tsvFormatValue(value)
等同于 d3.dsvFormat("\t").formatValue
。
¥Equivalent to d3.dsvFormat("\t").formatValue
.
autoType(object)
源代码 · 给定一个表示已解析行的对象(或数组),推断该对象值的类型并进行相应的强制类型转换,返回变异后的对象。此函数旨在用作与 dsv.parse 和 dsv.parseRows 结合使用的行访问器函数。例如,考虑以下 CSV 文件:
¥Source · Given an object (or array) representing a parsed row, infers the types of values on the object and coerces them accordingly, returning the mutated object. This function is intended to be used as a row accessor function in conjunction with dsv.parse and dsv.parseRows. For example, consider the following CSV file:
Year,Make,Model,Length
1997,Ford,E350,2.34
2000,Mercury,Cougar,2.38
与 d3.csvParse 一起使用时,
¥When used with d3.csvParse,
d3.csvParse(string, d3.autoType)
生成的 JavaScript 数组为:
¥the resulting JavaScript array is:
[
{"Year": 1997, "Make": "Ford", "Model": "E350", "Length": 2.34},
{"Year": 2000, "Make": "Mercury", "Model": "Cougar", "Length": 2.38}
]
Type inference works as follows.对于给定对象中的每个值,计算 trimmed 的值;然后按如下方式重新赋值:
¥Type inference works as follows. For each value in the given object, the trimmed value is computed; the value is then re-assigned as follows:
如果为空,则返回
null
。¥If empty, then
null
.如果恰好是
"true"
,则为true
。¥If exactly
"true"
, thentrue
.如果恰好是
"false"
,则为false
。¥If exactly
"false"
, thenfalse
.如果恰好是
"NaN"
,则为NaN
。¥If exactly
"NaN"
, thenNaN
.否则,如果是 可强制转换为数字,则返回数字。
¥Otherwise, if coercible to a number, then a number.
否则,如果是 仅日期或日期时间字符串,则返回日期。
¥Otherwise, if a date-only or date-time string, then a Date.
否则,返回一个字符串(未修剪的原始值)。
¥Otherwise, a string (the original untrimmed value).
以零开头的值可能会被强制转换为数字;例如,"08904"
强制转换为 8904
。但是,诸如逗号或单位之类的额外字符(例如,"$1.00"
、"(123)"
、"1,234"
或 "32px"
)将阻止数字强制转换,从而导致字符串。
¥Values with leading zeroes may be coerced to numbers; for example "08904"
coerces to 8904
. However, extra characters such as commas or units (e.g., "$1.00"
, "(123)"
, "1,234"
or "32px"
) will prevent number coercion, resulting in a string.
日期字符串必须符合 ECMAScript 的 ISO 8601 格式 子集。当指定仅包含日期的字符串(例如 YYYY-MM-DD)时,推断时间为 UTC 午夜;但是,如果指定了日期时间字符串(例如 YYYY-MM-DDTHH:MM)且不带时区,则假定为本地时间。
¥Date strings must be in ECMAScript’s subset of the ISO 8601 format. When a date-only string such as YYYY-MM-DD is specified, the inferred time is midnight UTC; however, if a date-time string such as YYYY-MM-DDTHH:MM is specified without a time zone, it is assumed to be local time.
自动类型推断主要是为了结合 dsv.format 和 dsv.formatRows,为常见的 JavaScript 类型提供安全、可预测的行为。如果你需要不同的行为,则应该实现自己的行访问器函数。
¥Automatic type inference is primarily intended to provide safe, predictable behavior in conjunction with dsv.format and dsv.formatRows for common JavaScript types. If you need different behavior, you should implement your own row accessor function.
更多信息,请参阅 d3.autoType 注意本。
¥For more, see the d3.autoType notebook.
内容安全策略
¥Content security policy
如果已指定 内容安全策略,请注意,由于使用动态代码生成进行快速解析(安全),dsv.parse 需要 script-src
指令中的 unsafe-eval
。(请参阅 source。)或者,使用 dsv.parseRows。
¥If a content security policy is in place, note that dsv.parse requires unsafe-eval
in the script-src
directive, due to the (safe) use of dynamic code generation for fast parsing. (See source.) Alternatively, use dsv.parseRows.
字节顺序标记
¥Byte-order marks
DSV 文件有时以 字节顺序标记 (BOM) 开头;例如,从 Microsoft Excel 以 CSV UTF-8 格式保存电子表格将包含 BOM。在 Web 上,这通常不是问题,因为编码标准中指定的 UTF-8 解码算法 会删除 BOM。另一方面,Node.js 在解码 UTF-8 时使用 不删除 BOM。
¥DSV files sometimes begin with a byte order mark (BOM); saving a spreadsheet in CSV UTF-8 format from Microsoft Excel, for example, will include a BOM. On the web this is not usually a problem because the UTF-8 decode algorithm specified in the Encoding standard removes the BOM. Node.js, on the other hand, does not remove the BOM when decoding UTF-8.
如果未删除 BOM,则文本的第一个字符为零宽度不间断空格。因此,如果使用 d3.csvParse 解析带有 BOM 的 CSV 文件,则第一列的名称将以零宽度不间断空格开头。这可能很难发现,因为这个字符在打印时通常是不可见的。
¥If the BOM is not removed, the first character of the text is a zero-width non-breaking space. So if a CSV file with a BOM is parsed by d3.csvParse, the first column’s name will begin with a zero-width non-breaking space. This can be hard to spot since this character is usually invisible when printed.
要在解析之前删除 BOM,请考虑使用 strip-bom。
¥To remove the BOM before parsing, consider using strip-bom.