Pretty cool work. Can you give any insights on the silver you used - especially the OSS ones. I had mixed (pun intended) experiences and would love to hear other opinions
I worked with MIPs for 8 years and commercial solvers have always been several orders of magnitude faster/better than open-source solvers.
This is generally true for domain-specific software -- unlike general purpose software, incenting a small pool of specialized talent to make open-source contributions is always hard.
The key to high performance in MIPs comes from having good heuristics, not necessarily from improving the basic algorithms (the algorithms are pretty standard -- simplex or interior point). Finding effective heuristics is hard and tedious, but they make a significant difference in solution speed. For instance, naive Simplex may take 40 minutes to solve a problem but with heuristics the solution time might be 5 seconds.
That said, Cbc is competitive for smaller problems, and here's the thing: many production sized problems aren't that big -- it really depends on your problem domain. I've deployed commercial solvers on Cbc (30k variables/constraints) and it was more than adequate.
I don't have any details on this, but Gurobi (a best of class solver) also offers an on-demand cloud SaaS which you can pay for on demand [1]. The economics of this may work out for some types of problems.
Also, if you're in academia, you can get Gurobi/CPLEX licenses for free (yes). My research group in grad school didn't spend a cent on these solvers, and we still got a taste of best-of-class solution performance (that's how they get you :).
I'm not OP, but work with MIP solvers. Yes, there is a massive difference between the commercial (CPLEX, GUROBI, and maybe XPRESS) versus the best open source solvers (CBC/CLP and GLPK). My industry can only use the open source solvers for prototyping small models and could never use one for production with the size models we have. The commercial solvers are extremely expensive too.
I suppose the cost of running the prediction service is made up for by the improved workload performance. How do you make that trade off between the MIP solver budget and resources spent training the model on the one hand, and workload throughput gains or latency reductions on the other hand?
Yes. In our case it's definitely worth it. Another interesting thing to consider is by moving from a kernel-level generic solution (Linux CFS) towards ML-driven systems that depend on the actual workloads we run on all of our clusters, it also implies different ways of debugging and iterating on software. We believe it's a net positive in our case.
I love this type of work, the final analysis shows a significant change but one that I've had difficulty with in the past. Decrease your variance and your 99 percentile and your going to see gains even if you see an increase in your volume of times near the median.
So often we see projects that talk about cutting costs by order of magnitudes but this one is taking an order of magnitude of pods greater than most of what we will do and reducing it fractionally
Maybe not applicable to this article, but since caches and numa was mentioned...
Has anyone studied effects of CPU allocation on cache coherency performance (different L2 caches talking to synchronize data) and in the end overall system performance?
Intel VTune Amplifier can help you with that, although it has to be tested with each release of your code to get the most juice out of it as every application is different in terms of Computation, Disk IO and Network IO.
Anyone has an experience running openfoam in the containers? —cpuset-cpus gives me huge performance hit. It’s painfully slow if I allow 16 cores to docker, while running it directly in VM with 16 cores it’s much faster?