Agreed, I had a lot of fun putting a toy one together, even if my compression format was worse in every way than a real format and my implementation quite inefficient
This exact problem burned me when I started doing transcriptomics, one of the things I found that helps mitigate the problem is to always keep both the gene symbol and a ID like an Ensemble or Entrez ID for every data point because those don’t get mangled by Excel
From my experience Python/R can be great when you have large scale analysis to do but if you have a task that requires more manual fiddling with the data then it’s much nicer to use Excel.
If I’m not mistaken I think that some languages with managed effects allow you to do this through types. For example, in Elm HTTP requests have the type of Cmd Msg and the only way to actually have the IO get executed is to pass that Cmd Msg to the runtime through the update function. This means you can easily get visibility, enforced by the type system, into what your dependencies do and restrict dependencies from making http requests or doing other effects.