Personally, I find treemaps unmatched for disk space analysis. Specifically, I like to use the squarify layout algorithm, to NOT use the "cushion gradient" shading method, to use inset frames to convey depth visually, and to include filenames. This maximizes glanceable information density, for the use case of identifying large objects to delete to recover space.
This is how the old spacemonger app worked, and I liked it so much I had to recreate it for Linux/Mac: https://github.com/alanbernstein/treemonger. My version still needs some work, but it's minimally useable.
Treemaps are also good for profiling (see KCachegrind), they waste a lot less space than flamegraphs and the area-relationship is relatively well maintained.
The treemap screenshot doesn't look correct. Nearly all charting libs (like Apache Echarts) will group nodes with a heading name, so not sure why they claim it would be hard to notice the "drivers" node. I guess in that screenshot, sure, but that looks like just a bad implementation of a treemap. Maybe this was the case back in 2017?
Flame graphs I have a love/hate relationship with. The hierarchy is very useful, but the name and coloring can be very confusing and misleading. Most people I show them to think red == something bad, but the color is actually just for aesthetics.
Yes, pretty much all treemap disk space tools I've used will perform color gradient grouping on boxes, with directories fitting in larger boxes. The box may not be drawn, but the inner boxes will align, visually making a larger box. Also, mouse hovers go a long way.
Like, one just has to look at the qdirstat screenshot at https://github.com/shundhammer/qdirstat. On the bottom-right corner, there are visually distinct boxes of sub-boxes that guide the eye towards a logical set of files.
At an old startup attempt we once created a nested hierarchy metrics visualization chart that I later ended up calling Bookshelf Charts, as some of the boxes filled with with smaller boxes looked like a bookshelf (if you tilted your head 90 degrees). Something between FlameGraphs and Treemaps. We also picked “random” colors for aesthetics, but it was interactive enough so you could choose a heat map color for the plotted boxes (where red == bad).
The source code got lost ages ago, but here are some screenshots of bookshelf graphs applied to SQL plan node level execution metrics:
Very neat. And if anyone from Plotly should happen to be reading this, a compact format like this might be an interesting option for Icicle Charts, akin to how the compact, indented version of Excel pivot tables saves horizontal space over the "classic" format pivot table.
Treemap is the densest/most accurate information source on a per px basis. Flamegraphs are pretty good but with a fixed Y and variable X your box area is inaccurate, and it wastes a fair amount of plot space with the non-flame area. The sunburst chart is really pretty but bad from an information communication perspective.
Flamegraphs are a really lovely tool for visualizing trees. Slightly related anecdote:
A while ago I was experimenting with interactive exploration of (huge) Monte Carlo Tree Search trees. Inspired by file system visualization tools, my first attempts were also tree maps and sunburst graphs, but I ran into the same problems as in the article.
I tried flamegraphs next with the following setup:
- The number of visits in each node maps to the width and order of each bar (i.e., the most visited node was first and was the largest)
- The expected value maps to the color of each bar.
And then it was a perfect fit: it's easy to see what's going on in each branch at the first levels, and the deeper levels can be explored through drilling down.
Treemaps are indifferent to "unknown" or "unlabeled" nodes. Area is disk space.
Whereas the simple act of labelling a node adds another outer ring arc to the sunburst (thus more coloured area), even though the underlying truth hasn't changed.
All of these suck. Use nested bar graphs like TreeSize and it’s instantly obvious what your biggest hitter is for any particular nesting level.
In lieu of that, a flame graph is tolerable. The polar coordinate one is very pretty garbage. EDIT: Use it when you want to mislead people with a flashy graph.
All embeddings of hyperbolic space into eucleadean space suck. You can't preserve distances and areas between them. Trees live in a hyperbolic space so every visualization of trees on a screen will suck in some way.
This simple math fact is the reason why all grand hyperlink projects from 1960 to 2010 couldn't work, e.g. Xanadu.
Worse, in small examples with fewer than a hundred nodes it looks like it is a real improvement over linear text with jumps - we are after all now using _all_ the possible screen real estate.
Ehhh. I think if you're trying to show the overall costs of something to someone that conclusion makes sense, but interactive flame graphs are the best way imo to look into things. Especially making use of sandwich views, which allow you to pivot the flame graph around some function to see callers and callees by cost.
Edit: I'll keep this up to share my embarrassment, but I missed entirely that the article was about disk space. I admit I only looked at the pictures haha.
Oh this is beautiful and I'm so glad it's been reposted because I missed it the first time.
Flamegraphs seem so much more interpretable and informative than the other plots there, at least to me personally. And I never would have thought to use them for this, because usually when I need to clean out disks or take care of storage it's time sensitive and I want to spend the minimum time figuring things out, and poor viz is enough to accomplish the goal.
An ongoing falmegraph of disk usage over time would be super helpful for many systems I'm working with right now.
For profiling I like the dual representation of treemap and tree of https://kcachegrind.github.io/html/Home.html a lot. Addresses the criticized points of treemaps of the post (see percentage and estimate areas of sub-trees) better than the examples chosen there.
Side note: To anyone that reaches out for du and ncdu from time to time. I recommend checking out `dua` (and `dua interactive`). It's way faster on my SSDs
This is how the old spacemonger app worked, and I liked it so much I had to recreate it for Linux/Mac: https://github.com/alanbernstein/treemonger. My version still needs some work, but it's minimally useable.
reply