I consult for a small company which feeds some of the largest market research companies. This company finds data providers for each country, collect the data monthly and need to massage it into a uniform structure before handing it over. I help them scripting this. I found importing the monthly spreadsheets into mongodb and querying the set can replace an awful lot of manual scripting work. That aggregator queries are a good fit for an aggregator company shouldn't be that big of a surprise, I guess.
The mongodb instance is ephemeral, the database itself is ephemeral, both only exist while the script is running which can be measured in seconds. The structure is changing from month to month. All this plays to the strengths of mongodb while avoiding the usual problems. For eg one stage of the aggregate pipeline can only be 100MB? A source csv is a few megabytes at most.
Ps.: no, Excel can't do it, I got involved with this when the complexity to do it in Excel has become unbearable.
The expertise in software engineering typical in these promptfondling companies shine through this blog post.
Surely they know 100% code coverage is not a magical bullet because the code flow and the behavior can differ depending on the input. Just because you found a few examples which happen to hit every line of code you didn't hit every possible combination. You are living in a fool's paradise which is not a surprise because only fools believe in LLMs. You are looking for a formal proof of the codebase which of course no one does because the costs would be astronomical (and LLMs are useless for it which is not at all unique because they are useless for everything software related but they are particularly unusable for this).
So, what is the solution? Senior engineer looks over PR and signs LGTM? That is just "vibe testing". The worst kind of testing. I think the author is right, setting up tests to form a reactive environment for coding agents will lead us to a new golden age. If you later find some issue with your test case coverage, you expand it. But it is good to do it from the start as throroughtly as possible.
1. Clearly explain the massive harm LLMs cause society and the environment to everyone. (Mass media should do this instead of parroting every nonsense the promptfondlers feed them.)
2. Ban them all. Don't tell me it's impossible just because it's widespread. Asbesthos was everywhere.
It's a bold claim that LLMs are useless for formal verification when people have been hooking them up to proof assistants for a while. I think that it's probably not a terrible idea; the LLM might make some mistakes in the spec but 99% of the time there are a lot of irrelevant details that it will do a serviceable job with.
The mongodb instance is ephemeral, the database itself is ephemeral, both only exist while the script is running which can be measured in seconds. The structure is changing from month to month. All this plays to the strengths of mongodb while avoiding the usual problems. For eg one stage of the aggregate pipeline can only be 100MB? A source csv is a few megabytes at most.
Ps.: no, Excel can't do it, I got involved with this when the complexity to do it in Excel has become unbearable.
reply