Choosing the right tool for data visualization is important and scatterplots aren't good for everything, but they have their place and can be really useful. For instance this scatterplot from carsabi.com shows a lot of very useful information: http://cl.ly/GvnM
Scatterplots can be a lot more useful when you do things like display the points in different colors or highlight a specific point of interest as in the Carsabi scatterplots.
Do you really think scatterplots should never be used? I think a headline like that takes away from your credibility. The problem isn't with scatterplots, it's choosing the right tool for the job.
Postscript: Some people seem to be interpreting me as making a stronger claim than I intend. There are obviously a few cases when a scatterplot truly is the right tool.
Inflammatory, overstated headlines are a rhetorical device that we're apparently stuck with for a long time to come. They're like television advertisements used to be: only curmudgeons[1] complain about them anymore because people assume the content wouldn't exist without them.
> For instance this scatterplot from carsabi.com shows a lot of very useful information: http://cl.ly/GvnM
This graph by itself is arbitrarily lossy in that we don't know how many overlapping samples there are at each point. And we don't know if blue samples are overlapping orange ones or vice-versa. I'm not sure I would be as categorical as the author about never using scatter plots, but he makes a really good point.
"Never" is a strong word, but I wanted a succinct title.
I think the default should be a density plot. It's only in special cases that a scatterplot would be appropriate. For example, that Carsabi plot actually works well, due to the fact that the reader is interested in finding a specific data point rather than understanding the global behavior.
I think the default should be the method that displays the most information. Why hide information if you don't have to? In the case of one dimensional data, a dotplot shows the reader everything. Using a boxplot reduces information content, mean-plus-errorbars reduces this further. The mean plus errorbars imposes a probability distribution, which may be wrong, it doesn't reveal a hidden truth.
The same holds in two dimensions. Show me all the data, and include a regression line or a spline to highlight a trend. Only start hiding information when the scatterplot becomes misleading. That is, when overplotting prevents me from accurately assessing the actual distribution of the points.
Jumping immediately to a density plot also restricts me to your interpretation. The original data is lost. With a scatterplot, the raw data can be recovered from the plot, so i can do my own analysis should i be interested. This is common in meta-analyses that extract data from multiple published papers. If those original papers had used density plots instead of scatterplots, reanalysis will require direct access to the underlying data. Once the original author dies, or loses the data, all further use of the data is lost.
The original data would be well represented in 100x100 matrix.
since the data (grades 0-100) is already discrete. Basically the first picture in the article with a alpha setting that matches 1 (1=opaque) when multiplied with the maximum number of entries per field. e.g Max entries = 5 => alpha = 1/5 = 0.2.
Alternatively aggregating for 10x10 20x20 25x25 50x50 would work to if the data is too sparse. There is in need for Hex binning in this case!
When overplotting, the usual compositing operator gives a final alpha of
1 - (1-alpha)^N
So your alpha = 1/5 overdrawn 5 times would give a final opacity of ~0.673. By its very nature, there is no alpha < 1 which when composited together a finite number of times gives alpha = 1.
I was aware of that this approach towards alpha was oversimplistic to begin with, should have pointed that out. Thanks for posting the correct formula.
Scatterplots can be a lot more useful when you do things like display the points in different colors or highlight a specific point of interest as in the Carsabi scatterplots.
Do you really think scatterplots should never be used? I think a headline like that takes away from your credibility. The problem isn't with scatterplots, it's choosing the right tool for the job.