D3: Scales

We previously discussed scales in depth, and for good reason: D3 is heavily built around scales, in order to map a data attribute in the domain, to a visual variable.

Recall: scales map data from a domain to a range. The domain refers to an attribute of our data. The range refers to a visual channel: space, shape, size, orientation, color, and so on. And, we can use different types of scales based on the data type of the domain, and the data type of the visual range (e.g. quantitative, ordinal, nominal). We will now cover the prominent scales provided by D3.

Continuous Scales

Continuous scales assume quantitative data in the domain and range.

d3.scaleLinear()

A linear scale is constructed by specifying the minimum and maximum values for the domain and the range:

scale = d3.scaleLinear()
	.domain([min_d,max_d])
	.range([min_x,max_x])

In the above, we first create a linear scale which returns itself (method chaining), followed by specifying the domain, and then the range, where method chaining is used in both cases. To use the scale, for a given data attribute value d, we simply pass it in to the scale to obtain the mapped value in the range: scale(d).

Let’s revisit our scatterplot, and see how much more flexible things get when we use scales:

var svg0 = d3.select('#svg0');
var circle_data = [];
for(var i = 0; i < 20; i++)
	circle_data.push([10+3000*Math.random(),10+3000*Math.random()]);
var radius = 8;

var min_circle_x = d3.min(circle_data, d => d[0]), max_circle_x = d3.max(circle_data, d => d[0])
var min_circle_y = d3.min(circle_data, d => d[1]), max_circle_y = d3.max(circle_data, d => d[1])
var min_x = 0, max_x = svg0.attr('width'), min_y = svg0.attr('height'), max_y = 0;
var pad_x = (max_circle_x-min_circle_x)*0.05, pad_y = (max_circle_y-min_circle_y)*0.05;
var range_pad = 40;

var scale_x = d3.scaleLinear().domain([min_circle_x-pad_x,max_circle_x+pad_x]).range([min_x+range_pad,max_x-range_pad]).nice()
var scale_y = d3.scaleLinear().domain([min_circle_y-pad_y,max_circle_y+pad_y]).range([min_y-range_pad,max_y+range_pad]).nice()

svg0.selectAll('circle').data(circle_data).enter().append('circle')
	.attr('cx', d => scale_x(d[0]))
	.attr('cy', d => scale_y(d[1]))
	.attr('r', radius)
	.attr('fill', '#777')

Other Continuous Scales

D3 has a host of other continuous scales; we will not cover each one, please see the docs for more information.

Color

The range need not only be numbers! D3 provides support for other types of values that can be readily interpolated. Among them: color!

D3 supports color in many different types of formats:

prespecified names ('red', 'green', 'blue')
hexadecimal strings ('0xff0000','0x00ff00','0x0000ff')
explicit RGB values ('rgb(255,0,0)', 'rgb(0,255,0)', 'rgb(0,0,255)')

You can mix and match these color types when specifying your range in any of the continuous scales. Let’s see an example:

var svg3 = d3.select('#svg3');
var width = svg3.attr('width'), height = svg3.attr('height')

var circle_data = [];
for(var i = 0; i < 20; i++)
	circle_data.push([10+30*Math.random(),10+30*Math.random()]);
var radius = 8;

var min_circle_x = d3.min(circle_data, d => d[0]), max_circle_x = d3.max(circle_data, d => d[0])
var min_circle_y = d3.min(circle_data, d => d[1]), max_circle_y = d3.max(circle_data, d => d[1])
var min_x = 0, max_x = width, min_y = height, max_y = 0;

var circle_scale_x = d3.scaleLinear().domain([min_circle_x,max_circle_x]).range([min_x+radius,max_x-radius])
var circle_scale_y = d3.scaleLinear().domain([min_circle_y,max_circle_y]).range([min_y-radius,max_y+radius])
var fill_scale = d3.scaleLinear().domain([min_circle_x,max_circle_x]).range(['red','green'])

svg3.selectAll('circle').data(circle_data).enter().append('circle')
	.attr('cx', d => circle_scale_x(d[0]))
	.attr('cy', d => circle_scale_y(d[1]))
	.attr('r', radius)
	.attr('fill', d => fill_scale(d[0]))

Above I’ve used RGB as the color space. D3 also supports other color spaces, but we will hold off on that for now, and revisit this in some detail in the next couple of weeks.

Time

Time can also be used, both as a domain and a range. d3.scaleTime assumes a time domain, and some range - it could be scalar values, colors, or even dates.

Quantized Scales

D3 also has support for quantized scales, where the domain is quantitative and the range is ordinal, as covered earlier. Let’s see how this works through an example:

var svg4 = d3.select('#svg4');
var width = svg4.attr('width'), height = svg4.attr('height')

var circle_data = [];
for(var i = 0; i < 70; i++)
	circle_data.push([i,0.5*i + 0.1*i*i - Math.sqrt(i)]);
var radius = 4;

var min_circle_x = d3.min(circle_data, d => d[0]), max_circle_x = d3.max(circle_data, d => d[0])
var min_circle_y = d3.min(circle_data, d => d[1]), max_circle_y = d3.max(circle_data, d => d[1])
var min_x = 0, max_x = width, min_y = height, max_y = 0;

var circle_scale_x = d3.scaleLinear().domain([min_circle_x,max_circle_x]).range([min_x+radius,max_x-radius])
var circle_scale_y = d3.scaleLinear().domain([min_circle_y,max_circle_y]).range([min_y-radius,max_y+radius])

var x_range = circle_scale_x.range();
var num_discrete = 8;
var discrete_range = d3.range(x_range[0],x_range[1],(x_range[1]-x_range[0])/num_discrete);
var circle_scale_x_quantized = d3.scaleQuantize().domain([min_circle_x,max_circle_x]).range(discrete_range)

svg4.selectAll('circle').data(circle_data).enter().append('circle')
	.attr('cx', d => circle_scale_x_quantized(d[0]))
	.attr('cy', d => circle_scale_y(d[1]))
	.attr('r', radius)
	.attr('fill', '#777777')

Band Scales

D3 has support for band scales, where the domain is ordinal and the range is quantitative, as previously discussed. Band scales are a very useful way of associating discrete data with continuous visual ranges. As an example: we might want to plot bars, where each bar is identified with an ordinal value (positioning its base) and a quantitative value (positioning its height). We can also nest band scales, allowing us to use one band scale to position a group of marks, and then within each group we use a band scale to position the individual bar marks.

D3 also supports point scales, which can be used for other types of marks and visual channels.

Let’s revise our previous grouped bar marks example:

var svg5 = d3.select('#svg5');
var width = svg5.attr('width'), height = svg5.attr('height')

var bar_data = [];
var n_bars = 20;
var num_groups = 4;
for(var i = 0; i < num_groups; i++)  {
	var bar_group = [];
	for(var j = 0; j < n_bars; j++)
		bar_group.push(Math.random());
	bar_data.push({'the_bars':bar_group, 'color_data':i});
}

var bar_group_array = d3.range(num_groups);
var bar_scale_group = d3.scaleBand().domain(bar_group_array).range([0,width]).paddingInner(0.3);
var color_scale = d3.scalePoint().domain(bar_group_array).range([0,360]).padding(0.4);

var bar_x_array = d3.range(n_bars);
var bar_scale_x = d3.scaleBand().domain(bar_x_array).range([0,bar_scale_group.bandwidth()]).paddingInner(0.4);
var bar_scale_y = d3.scaleLinear().domain([0,1]).range([height,0])

var bar_selection = svg5.selectAll('g').data(bar_data).enter().append('g')

svg5.selectAll('g').attr('transform', (d,i) => 'translate('+bar_scale_group(i)+',0)')
svg5.selectAll('g').append('rect')
	.attr('x', 0).attr('width', bar_scale_group.bandwidth()).attr('y', 0).attr('height', height)
	.attr('fill', d => d3.hcl(color_scale(d.color_data), 40, 70))

bar_selection.selectAll('newrects').data(d => d.the_bars).enter().append('rect')
	.attr('class', 'barrect')

svg5.selectAll('g').selectAll('.barrect')
	.attr('x', (d,i) => bar_scale_x(i)).attr('width', bar_scale_x.bandwidth())
	.attr('y', d => bar_scale_y(d)).attr('height', d => bar_scale_y(0)-bar_scale_y(d))
	.attr('fill', '#555')

Axes

So we’ve now mastered scales, but we can’t actually see them! This is unfortunate. But fortunately, displaying them is quite straightforward via d3.axis.

To use an axis, we need to first tell D3 where to position it. We use the group element <g> for this purpose, to specify an appropriate transformation. D3 axis actually returns a function, intended for D3 to be invoked in a specific way. D3 uses the call function (to be discussed) as a way to transform an element, in this case the group element. D3 will create the appropriate visual elements for an axis as children of the group element.

Let’s walk through an example, adding axes to our first plot:

// top axis
svg0.append('g')
	.attr('id', 'topaxis')
	.attr('transform', 'translate('+'0'+','+(range_pad)+')')
	.call(d3.axisTop(scale_x))
// bottom axis
svg0.append('g')
	.attr('id', 'bottomaxis')
	.attr('transform', 'translate('+'0'+','+(svg0.attr('height')-range_pad)+')')
	.call(d3.axisBottom(scale_x))

// left axis
svg0.append('g')
	.attr('id', 'leftaxis')
	.attr('transform', 'translate('+(range_pad)+','+'0'+')')
	.call(d3.axisLeft(scale_y))
// right axis
svg0.append('g')
	.attr('id', 'rightaxis')
	.attr('transform', 'translate('+(svg0.attr('width')-range_pad)+','+'0'+')')
	.call(d3.axisRight(scale_y))

D3: Shapes

We have thus far seen several shapes that are straightforward to draw with SVG: circles, rectangles, lines. But for common visualizations, these shapes are less than ideal for plotting.

Let’s take the line mark as an example. We can realize a line mark in SVG using path elements, wherein we can draw polylines or smooth curves. However, path elements are difficult to construct, and quite tedious to work with (we’ll see why shortly).

This is where D3 Shapes come in. Let’s take a look at a few important ones.

d3.line()

The d3.line() is a generator for line marks. It generates the coordinates we would like to specify in a path element. To use a line we need to tell it how to transform data into x-coordinates and y-coordinates, with respect to the coordinate system of the SVG element to which we are adding the line mark. As we saw in the previous lecture, scales are perfect for this! Let’s see this in action:

var svg7 = d3.select('#svg7');
var width = svg7.attr('width'), height = svg7.attr('height'), pad_range = 40;

var line_data = [];
for(var i = 0; i < 70; i++)
	line_data.push([i,0.5*i + 0.1*i*i - Math.sqrt(i)]);

var min_line_x = d3.min(line_data, d => d[0]), max_line_x = d3.max(line_data, d => d[0])
var min_line_y = d3.min(line_data, d => d[1]), max_line_y = d3.max(line_data, d => d[1])
var min_x = pad_range, max_x = width-pad_range, min_y = height-pad_range, max_y = pad_range;
var pad_x = (max_line_x-min_line_x)*0.02, pad_y = (max_line_y-min_line_y)*0.02;

var line_scale_x = d3.scaleLinear().domain([min_line_x-pad_x,max_line_x+pad_x]).range([min_x,max_x])
var line_scale_y = d3.scaleLinear().domain([min_line_y-pad_y,max_line_y+pad_y]).range([min_y,max_y])

var line = d3.line()
	.x(d => line_scale_x(d[0]))
	.y(d => line_scale_y(d[1]))

svg7.append('path').datum(line_data)
//svg7.selectAll('path').data([line_data]).enter().append('path')
	.attr('d', d => line(d))
	.attr('fill', 'none')
	.attr('stroke', '#777777')
	.attr('stroke-width', '3')

svg7.append('g').attr('transform', 'translate('+pad_range+',0)').call(d3.axisLeft(line_scale_y))
svg7.append('g').attr('transform', 'translate(0,'+(min_y)+')').call(d3.axisBottom(line_scale_x))

Takeaways:

In the case of a line mark, although our data is an array of (x,y) coordinates, we generate a single element, rather than an element per datum.
Also, datum: assigns whatever is passed in to the selection. Kind of like a trivial data join, but useful in certain circumstances.

d3.area()

We can also generate area marks in a similar manner, using d3.area. Here we prescribe the coordinates of the polygon that will bound the area: rather than x and y coordinates, we specify the lower and upper bounds for x and y. Typically x is fixed, and we vary y. Let’s see an example:

var svg8 = d3.select('#svg8');
var width = svg8.attr('width'), height = svg8.attr('height'), pad_range = 40;

var line_data = [];
for(var i = 0; i < 70; i++)
	line_data.push([i,0.3*i + 0.05*i*i - Math.sqrt(i),0.5*i + 0.1*i*i - Math.sqrt(i)]);

var min_line_x = d3.min(line_data, d => d[0]), max_line_x = d3.max(line_data, d => d[0])
var min_line_y = d3.min(line_data, d => d[2]), max_line_y = d3.max(line_data, d => d[2])
var min_x = pad_range, max_x = width-pad_range, min_y = height-pad_range, max_y = pad_range;
var pad_x = (max_line_x-min_line_x)*0.02, pad_y = (max_line_y-min_line_y)*0.02;

var line_scale_x = d3.scaleLinear().domain([min_line_x-pad_x,max_line_x+pad_x]).range([min_x,max_x])
var line_scale_y = d3.scaleLinear().domain([min_line_y-pad_y,max_line_y+pad_y]).range([min_y,max_y])

var area = d3.area()
	.x(d => line_scale_x(d[0]))
	.y0(d => line_scale_y(d[1]))
	.y1(d => line_scale_y(d[2]))

svg8.append('path').datum(line_data)
	.attr('d', d => area(d))
	.attr('fill', '#555')
	.attr('stroke', '#999')
	.attr('stroke-width', '3')

svg8.append('g').attr('transform', 'translate('+pad_range+',0)').call(d3.axisLeft(line_scale_y))
svg8.append('g').attr('transform', 'translate(0,'+(min_y)+')').call(d3.axisBottom(line_scale_x))

d3.link()

D3 has a wide array of support for defining curves. We will not get into the details of how to create curves (this is one or more courses of material in and of itself). Instead, we will consider one very useful way of generating curves, intended for network visualization: links.

Let’s suppose we wanted to layout a network, and we would like to see a natural progression of the network going from left to right. For instance, a tree would be a good example of this, where the root node starts at the left, and the leaf nodes are on the right. To show the network, we could draw straight lines between nodes, but … thats a little boring, and leads to the perception of excessive clutter. Instead, we can use d3.linkHorizontal, which will generate a cubic Bezier curve between nodes that connects the nodes, and whose tangent vector at the nodes is horizontal. Similar reasoning holds for d3.linkVertical.

To setup a link, you must perform a data join on an array where each object in the array contains a source field and a target field containing some type of reference to the source and target nodes (forming an edge!). Furthermore, we specify x and y functions to link to tell it how to access actual positions from the nodes. Let’s see this for drawing random trees:

var svg9 = d3.select('#svg9');
var width = svg9.attr('width'), height = svg9.attr('height'), pad_range = 40;

var edges = [];
var nodes = [];
var max_depth = 4;
var depth_inds = [];
for(var i = 0; i < max_depth; i++)
	depth_inds.push(i);
var depth_band = d3.scaleBand().domain(depth_inds).range([pad_range,width-pad_range])

function generate_nodes(depth, parent_ind, parent_lower, parent_upper)  {
	if(depth == max_depth)
		return;
	if(depth < 2)
		var n_nodes = 2 + Math.floor(Math.random() * 2);
	else
		var n_nodes = 2 + Math.floor(Math.random() * 3);
	var node_inds = [];
	for(var i = 0; i < n_nodes; i++)
		node_inds.push(i);
	var scale_band = d3.scaleBand().domain(node_inds).range([parent_lower,parent_upper])
	for(var i = 0; i < n_nodes; i++)  {
		var n_x = depth_band(depth)+depth_band.bandwidth()/2;
		var n_y = scale_band(i)+scale_band.bandwidth()/2;

		var node_pos = [n_x,n_y];
		nodes.push(node_pos);
		var node_ind = nodes.length-1;
		edges.push({
			source: parent_ind,
			target: node_ind
		});

		var n_y_lower = scale_band(i);
		var n_y_upper = scale_band(i)+scale_band.bandwidth();
		if((depth+1) < max_depth)
			generate_nodes(depth+1, node_ind, n_y_lower, n_y_upper)
	}
}

nodes.push([pad_range,height/2.0]);
generate_nodes(0, 0, height-pad_range, pad_range);
var horizontal_link = d3.linkHorizontal()
	.x(d => nodes[d][0])
	.y(d => nodes[d][1])

svg9.selectAll('edges').data(edges).enter().append('path')
	.attr('d', d => horizontal_link(d))
	.attr('fill', 'none').attr('stroke', '#666666')

svg9.selectAll('nodes').data(nodes).enter().append('circle')
	.attr('cx', d => d[0]).attr('cy', d => d[1]).attr('r', 3).attr('stroke-width', '1')
	.attr('fill', '#999999').attr('stroke', '#444444')

Other shapes

D3 supports a wide variety of other shapes that we will not get in to. The intent behind these shapes, as shown in the above, is not to provide graphical representations of the shapes, but rather to organize data into a form that is easier to draw. This is super important: as you develop your own visualizations given some arbitrary data, you will have to be thinking in these terms.

D3: Odds and Ends

d3.call()

Suppose we would like to perform multiple transformations to a selection. This would normally require performing the same sequence of method chaining, which is redundant. A useful function to achieve this is call, which operates on a single selection and allows you to pass in an arbitrary function, as well as arguments to your liking.

For instance, let’s suppose we wanted a way to modify a circle’s visual channels in terms of radius, fill color, stroke color, and stroke width. First let’s create the function that will achieve this, given the selection of circles, and the above arguments:

function circle_styler(selection, radius, fill_color, stroke_color, stroke_width)  {
	selection.attr('r', radius)
		.attr('fill', fill_color).attr('stroke', stroke_color)
		.attr('stroke-width', stroke_width);
}

Then, for any arbitrary selection of circles sel, and our prescribed arguments, we can invoke call:

sel.call(circle_styler, r, fill, stroke, width);

Call also returns the selection itself, thus permitting chaining as well.

d3.each()

Call operates on a single selection. To specify an arbitrary function for each element of a selection, use … each. The each function takes in a function, for which its arguments will be populated by the element’s datum and index (within the selection). Within the function, you may access the node itself, specifically the DOM element, with this.

Data Structures

D3 has a lot of useful data structures. We will discuss some in more detail later in the semester. But for now, here are a few useful ones:

d3.array()

D3 provides a number of useful functions for processing arrays (or more broadly “iterables”, like maps, sets, strings). Please see the full documentation for more details. Here is a summary:

Statistics: For computing a variety of statistics (min, max, mean, median, sum, etc..), D3 follows a consistent pattern. Consider mean for concreteness. If your array arr is composed of numbers, then simply call d3.mean(arr). However, if your array is composed of objects, then you can pass in an anonymous function to access a particular property, like so: d3.mean(arr, d => d.value).
d3.range(): analogous to Python’s built-in range function, you can pass in an optional start, required end, optional spacing, to produce an array of sequential numbers.

d3.collection()

D3 allows for different ways of organizing and deriving data that is super useful. Please see the full documentation for more details. A couple important functions:

d3.set(): produces a JS set object, returning all unique items in a given array. You can also pass in an anonymous function to specify what attribute you want to access (for more general objects).

d3.nest()

If you recall the “group-by” operation we went over in class, d3.nest() realizes this operation. It allows us to hierarchically group our data based on discrete attributes. There are two functions associated with a nest object that you will need to call:

key: specify the attribute along which you will group. This can be called multiple times over different attributes to give us a hierarchical structure (the depth being the number of calls key is made).
entries: pass in your data array.

Another useful, but optional, function is rollup. You typically call rollup after you have set up your key functions (completely specifying the hierarchy). This function accepts an anonymous function for which is passed in a single argument, an array that consists of all items in your data at a leaf in the hierarchy (e.g. all combinations of attributes specified by key). You must then return something as a result. You can do whatever you want with this array: summarize the data with a single value, a set of values, or just return the array as-is. It depends on your visualization design.

The data structure returned by d3.nest is very handy for working with discrete data. Let’s look at one example in detail:

Assume we are provided an array of car data. Each car has attributes origin (country/continent) and a weight (in lbs).
We would like to compare the average weight of each country.
But we are not provided the average! We need to derive it.
We do so with nest. We first “group-by” Origin, and then for all cars that have the same origin, we compute the mean of their weight.

d3.json('cars.json')
	.then(function(data)  {
		cars = data;
		plot_it();
	})

function plot_it()  {
	var svg10 = d3.select('#svg10');
	var width = 240, height = svg10.attr('height'), pad_range = 60;

	var all_weights = [];
	var nester = d3.nest()
		.key(car_d => car_d.Origin)
		.rollup(car_d => {
			var mean_weight = d3.mean(car_d, d => d.Weight_in_lbs);
			all_weights.push(mean_weight);
			return mean_weight;
		});

	var nested_data = nester.entries(cars);
	console.log(nested_data);

	var all_origins = d3.set(cars, d => d.Origin).values();
	var max_weight = d3.max(all_weights);

	var origin_band = d3.scaleBand().domain(all_origins).range([pad_range,width-pad_range]).paddingInner(0.1).paddingOuter(0.1);
	var weight_linear = d3.scaleLinear().domain([0,max_weight]).range([height-pad_range,pad_range]);

	svg10.selectAll('g').data(nested_data).enter().append('rect')
		.attr('x', d => origin_band(d.key)).attr('width', origin_band.bandwidth())
		.attr('y', d => weight_linear(d.value)).attr('height', d => weight_linear(0)-weight_linear(d.value))
		.attr('fill', '#777')

	svg10.append('g')
		.attr('transform', 'translate('+'0'+','+(height-pad_range)+')')
		.call(d3.axisBottom(origin_band))

	svg10.append('g')
		.attr('transform', 'translate('+(pad_range)+','+'0'+')')
		.call(d3.axisLeft(weight_linear))

	svg10.append('text').text('Origin')
		.attr('transform', 'translate('+(width/2)+','+(weight_linear(0)+40)+')').attr('text-anchor', 'middle')
	svg10.append('text').text('Weight')
		.attr('transform', 'translate('+(15)+','+(height/2)+') rotate(270)').attr('text-anchor', 'middle')
}

Advice

D3 is not a visualization panacea. In implementing a visualization technique, you will need to think carefully about the data you are given, what properties you may need to derive from the data, and the technique itself, whose implementation may have very little to do with its visual representation!
But also, keep D3 in mind when working with data. It’s sometimes useful to work backwards.
- “I want to display my data in a particular way.”
- “To do so, I need to have a certain spatial organization, a partitioning/grouping of my data into certain spatial regions, as well as a certain set of visual elements.”
- “To support this, I need to use some set of D3 functions, which require me to represent my data in a particular way.”
- “Given this, my data structures should be …”
Debugging
- Use the console! You may have a syntax error. If you do not, your elements may not be appearing in the DOM. If your elements aren’t appearing in the DOM, your selection may not be what you think it is.
- Inspect your selections! See what the groups and parents of a selection are. Is it what you intended?
- Sketch out your visualization! On paper. It’s easier to code your visualization to a concrete sketch then what you have in your head.