When Roomba thought it was about to be acquired by Amazon, it did lay off 10% of its staff - https://www.therobotreport.com/irobot-laying-off-10-of-staff.... and after the deal was canceled, it was disclosed that they had reduced R&D and focused on margin improvements, and there was some brain drain as people left Roomba as it was in a 18 month limbo - https://www.verdict.co.uk/irobot-to-cut-over-a-third-of-its-.... And of course all this self inflicted pain only hurt them doubly as the Amazon deal fell through. If they had acted as if they weren't going to be acquired they might be fine, but they tried to maximize the shareholder revenue.
Back in the day (about 2002) I was working at an education software company which was trying to get itself acquired by Microsoft. MSFT came in and told us our software didn't conform to all these "standards" in the educational software space. Standards which, coincidentally, Microsoft themselves had written. These pseudo-standards did absolutely nothing to help our customers, and were pure bureaucracy and very very complicated to implement.
I'd recently read Charles Ferguson's book about how his company was acquired by MSFT, and recognized this part of their standard operating procedure, along with extreme and invasive due diligence where they spend a lot of time working out if you're stupid/pliable enough to jump through these hoops while buying themselves time to work out if they can clone your product. I tried to warn management (yes, really - even bought them copies of the book) but naturally no one would listen, and reading a book was too much like hard work. At some point MSFT simply ceased returning management's calls, and rolled out a similar product a while later.
The company imploded not long after, not for this reason in particular, but it was part of a general pattern of incompetence and mismanagement.
Friend of mine was in a company that was going to be acquired by $bigcompany. They strung them along and strung the along until their VC funding was exhausted, then picked up the remains for a song. Much cheaper than actually buying them up.
The best open source OCR model for handwriting in my experience is surya-v2 or nougat, really depends on the docs which is better, each got about 90% accuracy (cosine similarity) in my tests. I have not tried Deepseek-OCR, but mean to at some point.
I live in a very very good area of Brooklyn and still regularly run into needles, human shit, and open fentanyl use.
LA is similar unless you never leave your little neighborhood.
DC was similar when I lived there about 4 years ago.
SF is cleaning up, but I’ve regularly walked on streets where it’s just bodies and needles
I was shocked by the Vietnamese area of Seattle. It felt like a zombie land.
I mean, if we’re talking city core yeah this it the average experience. I say this as someone who loves cities, American cities leave a lot to be desired and a lot of that comes from simply refusing to enforce basic laws that the rest of the world (including much more left countries) don’t hesitate to do.
In what "very very good area of Brooklyn" are you regularly encountering needles?!
I've lived in the NYC metro area for nearly two decades and have yet to see a single one. Definitely saw them when I lived in Baltimore, and have seen them in Philly, but even then not "regularly" in either case.
I looked into this a bit earlier this year. I'm mixed on it. While the FOSS in me wants it all open-source and available to use given that I'm basically labeling training data for them for free, and they are funded by donations/grants, I get value out of it for free.
My desire was to combine something like iNaturalist with BirdWeather for a bird tracker of audio and visual. BirdWeather does make it free which is great, but there's no great free API of iNaturalist quality for diverse bird tracking.
That being said, I am certain that if iNaturaist made their model public, tons of competitive apps would spring up and it'd be commercialized regardless of license immediately and would take people away from iNaturalist without giving iNaturalist anything in return.
Plus I know iNaturalist has issues with that they don't want autolabeled data uploaded as matched. They only want manually labeled data, which opening the API I'm sure would flood their server with ML labeled data. Which on the one hand, could be useful, but also a ton of noise.
I'm in favor of whatever option is most in line with keeping a long term success of a free, high quality plant/animal identifying app out there, and I don't know enough to take a definitive stance on that, and unfortunately those that do, probably have a vested interest in one of the outcomes.
To suggest another "simple" example, Air Conditioning. It made half the world vastly more livable, and now anywhere in the world you could work every day of the year, reduced deaths and disease. At least currently, AC has had a greater impact on humanity than AI has.
>This week, US District Judge Cormac Carney of the US District Court of the Central District of California decided that there's reason to believe that Visa knowingly processed payments that allowed MindGeek to monetize "a substantial amount of child porn." To decide, the court wants to know much more about Visa's involvement, calling for more evidence of legal harms caused during a jurisdictional discovery process extended through December 30, 2022.
So, I did some OCR research early last year, that didn't include any VLMs, on some 1960s era English scanned documents with a mix of typed and handwritten (about 80/20), and here's what I found (in terms of cosine similarity):
Handwritten is a weighted average of Handwritten and typed, I also did Jaccard and Levenshtein distance, but the results were similar enough that just leaving them out for sake of space.
Overall, of you want the best, if you're an enterprise, just use whatever AWS/GCP/Azure you're on, if you're an individual, pick between those. While some of the Open Source solutions do quite well, surya took 188 seconds to process 88 pages on my RTX 3080, while the cloud ones were a few seconds to upload the docs and download them all. But if you do want open source, seriously consider surya, tesseract, and nougat depending on your needs. Surya is the best overall, while nougat was pretty good at handwriting. Tesseract is just blazingly fast, from 121-200 seconds depending on using the tessdata-fast or best, but that's CPU based and it's trivially parallelizeable, and on my 5950X using all the cores, took only 10 seconds to run through all 88 pages.
But really, you need to generate some of your own sample test data/examples and run them through the models to see what's best. Given frankly how little this paper tested, I really should redo my study, add VLMs, and write a small blog/paper, been meaning to for years now.
RetroMags - hhttps://www.retromags.com/ has 5218 various gaming magazine issues and strategy guides one can download to check out! Looks like the VGHF and RetroMags are working together from forum posts, with the VGHF doing a lot of work on making them more accessible than a raw cbz/pdf download.
This, if your workout plan for the day has you running is six miles a day, rather than just running the same path over and over again, might as well have fun with it and add a bit more fun to your workout.
reply