When you ran it against 90,000 sites were you running the "shot-scraper" command 90,000 times? If so, my guess is that most of that CPU time is spent starting and stopping the process - shot-scraper wasn't designed for efficient start/stop times.
I wonder if that could be fixed? For the moment I'd suggest writing Playwright code for 90,000 site scraping directly in Python or JavaScript, to avoid that startup overhead.
Yes, indeed I launched shot-scraper command 90K times. Because it's convenient :-)
I didn't realize starting/stopping was that expensive. I thought it was mostly the fact that you're practically running a whole browser engine (along with a JS engine).
If I do this again, I'll look into writing the playwright code directly (I've never used it).
When you ran it against 90,000 sites were you running the "shot-scraper" command 90,000 times? If so, my guess is that most of that CPU time is spent starting and stopping the process - shot-scraper wasn't designed for efficient start/stop times.
I wonder if that could be fixed? For the moment I'd suggest writing Playwright code for 90,000 site scraping directly in Python or JavaScript, to avoid that startup overhead.