It's interesting to me to see pandas used in this application. I'd be curious to...

ketozhang · on Oct 27, 2021

Look into Dask if you are attempting to process an entire data table that's larger than memory.

For your last paragraph, you're conflating the need to share code with the need to build a robust scalable service. Most research code are only needed for the paper and rarely touched again.

murphy214 · on Oct 27, 2021

Most of my workin pandas is with small to medium size < 1g tables what constraints do you run into with the larger data sets you are talking about?