Skip to content

数据分箱

¥Binning data

将定量值分箱到连续、不重叠的区间中,就像在直方图中一样。(另请参阅可观察图的 bin 变换。)

¥Bin quantitative values into consecutive, non-overlapping intervals, as in histograms. (See also Observable Plot’s bin transform.)

bin()

js
const bin = d3.bin().value((d) => d.culmen_length_mm);

示例 · 源代码 · 使用默认设置构造一个新的 bin 生成器。返回的 bin 生成器支持方法链,因此此构造函数通常与 bin.value 链在一起以分配值访问器。返回的生成器也是一个函数;传递它的数据 到 bin。

¥Examples · Source · Constructs a new bin generator with the default settings. The returned bin generator supports method chaining, so this constructor is typically chained with bin.value to assign a value accessor. The returned generator is also a function; pass it data to bin.

bin(data) {#_bin}

js
const bins = d3.bin().value((d) => d.culmen_length_mm)(penguins);

将给定的数据样本可迭代对象分箱。返回一个容器数组,其中每个容器都是一个包含来自输​​入数据的相关元素的数组。因此,bin 的 length 是该 bin 中的元素数量。每个容器有两个附加属性:

¥Bins the given iterable of data samples. Returns an array of bins, where each bin is an array containing the associated elements from the input data. Thus, the length of the bin is the number of elements in that bin. Each bin has two additional attributes:

  • x0 - bin 的下限(含)。

    ¥x0 - the lower bound of the bin (inclusive).

  • x1 - bin 的上界(最后一个 bin 除外)。

    ¥x1 - the upper bound of the bin (exclusive, except for the last bin).

给定数据中任何为空或不可比较的值,或 domain 之外的值都将被忽略。

¥Any null or non-comparable values in the given data, or those outside the domain, are ignored.

bin.value(value) {#bin_value}

js
const bin = d3.bin().value((d) => d.culmen_length_mm);

如果指定了值,则将值访问器设置为指定的函数或常量,并返回此 bin 生成器。

¥If value is specified, sets the value accessor to the specified function or constant and returns this bin generator.

js
bin.value() // (d) => d.culmen_length_mm

如果未指定值,则返回当前值访问器,默认为恒等函数。

¥If value is not specified, returns the current value accessor, which defaults to the identity function.

当 bin 为 generated 时,将为输入数据数组中的每个元素调用值访问器,并将元素 d、索引 i 和数组 data 作为三个参数传递。默认值访问器假定输入数据是可排序的(可比较的),例如数字或日期。如果你的数据不是,那么你应该指定一个访问器,该访问器返回给定数据对应的可排序值。

¥When bins are generated, the value accessor will be invoked for each element in the input data array, being passed the element d, the index i, and the array data as three arguments. The default value accessor assumes that the input data are orderable (comparable), such as numbers or dates. If your data are not, then you should specify an accessor that returns the corresponding orderable value for a given datum.

这类似于在调用 bin 生成器之前将数据映射到值,但其优点是输入数据仍然与返回的 bin 相关联,从而更容易访问数据的其他字段。

¥This is similar to mapping your data to values before invoking the bin generator, but has the benefit that the input data remains associated with the returned bins, thereby making it easier to access other fields of the data.

bin.domain(domain) {#bin_domain}

js
const bin = d3.bin().domain([0, 1]);

如果指定了域,则将域访问器设置为指定的函数或数组,并返回此 bin 生成器。

¥If domain is specified, sets the domain accessor to the specified function or array and returns this bin generator.

js
bin.domain() // [0, 1]

如果未指定域,则返回当前域访问器,默认为 extent。bin 域定义为一个数组 [min, max],其中 min 是最小可观测值,max 是最大可观测值;两个值均包含在内。当 bin 为 generated 时,任何超出此范围的值都将被忽略。

¥If domain is not specified, returns the current domain accessor, which defaults to extent. The bin domain is defined as an array [min, max], where min is the minimum observable value and max is the maximum observable value; both values are inclusive. Any value outside of this domain will be ignored when the bins are generated.

例如,要使用带有 线性比例尺 x 的 bin 生成器,你可以这样写:

¥For example, to use a bin generator with a linear scale x, you might say:

js
const bin = d3.bin().domain(x.domain()).thresholds(x.ticks(20));

然后,你可以像这样从数字数组中计算出 bin:

¥You can then compute the bins from an array of numbers like so:

js
const bins = bin(numbers);

如果使用默认的 extent 域,并将 thresholds 指定为计数(而不是显式值),则计算出的域将是 niced,使得所有 bin 的宽度均匀。

¥If the default extent domain is used and the thresholds are specified as a count (rather than explicit values), then the computed domain will be niced such that all bins are uniform width.

请注意,域访问器是在 values 的具体化数组上调用的,而不是在输入数据数组上调用的。

¥Note that the domain accessor is invoked on the materialized array of values, not on the input data array.

bin.thresholds(thresholds) {#bin_thresholds}

js
const bin = d3.bin().thresholds(20);

如果将阈值指定为数字,则 domain 将被均匀划分为大约相同数量的箱体;参见 ticks

¥If thresholds is specified as a number, then the domain will be uniformly divided into approximately that many bins; see ticks.

js
const bin = d3.bin().thresholds([0.25, 0.5, 0.75]);

如果将 thresholds 指定为数组,则将阈值设置为指定值并返回此 bin 生成器。阈值定义为一个值数组 [x0, x1, …]。任何小于 x0 的值都将被放置在第一个 bin 中;任何大于或等于 x0 但小于 x1 的值都将放置在第二个容器中;等等。因此,生成的箱体 将包含 thresholds.length + 1 个 bin。任何超出 domain 范围的阈值都将被忽略。第一个 bin.x0 始终等于最小域值,最后一个 bin.x1 始终等于最大域值。

¥If thresholds is specified as an array, sets the thresholds to the specified values and returns this bin generator. Thresholds are defined as an array of values [x0, x1, …]. Any value less than x0 will be placed in the first bin; any value greater than or equal to x0 but less than x1 will be placed in the second bin; and so on. Thus, the generated bins will have thresholds.length + 1 bins. Any threshold values outside the domain are ignored. The first bin.x0 is always equal to the minimum domain value, and the last bin.x1 is always equal to the maximum domain value.

js
const bin = d3.bin().thresholds((values) => [d3.median(values)]);

如果将阈值指定为函数,则该函数将传递三个参数:从数据中派生出的输入数组 values,以及以最小值和最大值表示的 domain。该函数随后可能会返回数值阈值数组或箱体数量;在后一种情况下,域会被均匀地划分为大约计数个区间;参见 ticks。例如,你可能希望在对时间序列数据进行分箱时使用时间刻度;参见 example

¥If thresholds is specified as a function, the function will be passed three arguments: the array of input values derived from the data, and the domain represented as min and max. The function may then return either the array of numeric thresholds or the count of bins; in the latter case the domain is divided uniformly into approximately count bins; see ticks. For instance, you might want to use time ticks when binning time-series data; see example.

js
bin.thresholds() // () => [0, 0.5, 1]

如果未指定阈值,则返回当前阈值生成器,默认情况下该生成器实现 Sturges 公式。(因此,默认情况下,要分箱的值必须是数字!)

¥If thresholds is not specified, returns the current threshold generator, which by default implements Sturges’ formula. (Thus by default, the values to be binned must be numbers!)

thresholdFreedmanDiaconis(values, min, max)

js
const bin = d3.bin().thresholds(d3.thresholdFreedmanDiaconis);

源代码 · 根据 Freedman-Diaconis 规则 返回 bin 数量;输入值必须是数字。

¥Source · Returns the number of bins according to the Freedman–Diaconis rule; the input values must be numbers.

thresholdScott(values, min, max)

js
const bin = d3.bin().thresholds(d3.thresholdScott);

源代码 · 根据 Scott 正态参考规则 返回 bin 数量;输入值必须是数字。

¥Source · Returns the number of bins according to Scott’s normal reference rule; the input values must be numbers.

thresholdSturges(values, min, max)

js
const bin = d3.bin().thresholds(d3.thresholdSturges);

源代码 · 根据 Sturges 公式 返回 bin 数量;输入值必须是数字。

¥Source · Returns the number of bins according to Sturges’ formula; the input values must be numbers.