More

driscoll42 · 2026-01-05T13:55:35 1767621335

When Roomba thought it was about to be acquired by Amazon, it did lay off 10% of its staff - https://www.therobotreport.com/irobot-laying-off-10-of-staff.... and after the deal was canceled, it was disclosed that they had reduced R&D and focused on margin improvements, and there was some brain drain as people left Roomba as it was in a 18 month limbo - https://www.verdict.co.uk/irobot-to-cut-over-a-third-of-its-.... And of course all this self inflicted pain only hurt them doubly as the Amazon deal fell through. If they had acted as if they weren't going to be acquired they might be fine, but they tried to maximize the shareholder revenue.

rwmj · 2026-01-05T17:21:51 1767633711

I wonder if Amazon did that deliberately.

Back in the day (about 2002) I was working at an education software company which was trying to get itself acquired by Microsoft. MSFT came in and told us our software didn't conform to all these "standards" in the educational software space. Standards which, coincidentally, Microsoft themselves had written. These pseudo-standards did absolutely nothing to help our customers, and were pure bureaucracy and very very complicated to implement.

I'd recently read Charles Ferguson's book about how his company was acquired by MSFT, and recognized this part of their standard operating procedure, along with extreme and invasive due diligence where they spend a lot of time working out if you're stupid/pliable enough to jump through these hoops while buying themselves time to work out if they can clone your product. I tried to warn management (yes, really - even bought them copies of the book) but naturally no one would listen, and reading a book was too much like hard work. At some point MSFT simply ceased returning management's calls, and rolled out a similar product a while later.

The company imploded not long after, not for this reason in particular, but it was part of a general pattern of incompetence and mismanagement.

pseudohadamard · 2026-01-06T09:42:27 1767692547

Friend of mine was in a company that was going to be acquired by $bigcompany. They strung them along and strung the along until their VC funding was exhausted, then picked up the remains for a song. Much cheaper than actually buying them up.

HPsquared · 2026-01-05T15:03:54 1767625434

Poor risk management!

driscoll42 · 2025-12-03T15:59:22 1764777562

The best open source OCR model for handwriting in my experience is surya-v2 or nougat, really depends on the docs which is better, each got about 90% accuracy (cosine similarity) in my tests. I have not tried Deepseek-OCR, but mean to at some point.

driscoll42 · 2025-11-29T23:11:35 1764457895

In what world is that the "average experience" in American cities?

cglan · 2025-11-29T23:17:23 1764458243

I live in a very very good area of Brooklyn and still regularly run into needles, human shit, and open fentanyl use.

LA is similar unless you never leave your little neighborhood.

DC was similar when I lived there about 4 years ago.

SF is cleaning up, but I’ve regularly walked on streets where it’s just bodies and needles

I was shocked by the Vietnamese area of Seattle. It felt like a zombie land.

I mean, if we’re talking city core yeah this it the average experience. I say this as someone who loves cities, American cities leave a lot to be desired and a lot of that comes from simply refusing to enforce basic laws that the rest of the world (including much more left countries) don’t hesitate to do.

evanelias · 2025-11-30T00:13:40 1764461620

In what "very very good area of Brooklyn" are you regularly encountering needles?!

I've lived in the NYC metro area for nearly two decades and have yet to see a single one. Definitely saw them when I lived in Baltimore, and have seen them in Philly, but even then not "regularly" in either case.

mlmonkey · 2025-11-29T23:23:13 1764458593

Have you been to Kensington in Philly?

driscoll42 · 2025-09-03T14:40:05 1756910405

I looked into this a bit earlier this year. I'm mixed on it. While the FOSS in me wants it all open-source and available to use given that I'm basically labeling training data for them for free, and they are funded by donations/grants, I get value out of it for free.

My desire was to combine something like iNaturalist with BirdWeather for a bird tracker of audio and visual. BirdWeather does make it free which is great, but there's no great free API of iNaturalist quality for diverse bird tracking.

That being said, I am certain that if iNaturaist made their model public, tons of competitive apps would spring up and it'd be commercialized regardless of license immediately and would take people away from iNaturalist without giving iNaturalist anything in return.

Plus I know iNaturalist has issues with that they don't want autolabeled data uploaded as matched. They only want manually labeled data, which opening the API I'm sure would flood their server with ML labeled data. Which on the one hand, could be useful, but also a ton of noise.

I'm in favor of whatever option is most in line with keeping a long term success of a free, high quality plant/animal identifying app out there, and I don't know enough to take a definitive stance on that, and unfortunately those that do, probably have a vested interest in one of the outcomes.

driscoll42 · 2025-07-22T19:57:52 1753214272

Compared to all Whister models? Or the faster ones? And which version of Whisper? All for a faster, more accurate model, but need a bit more.

ipsum2 · 2025-07-22T20:04:31 1753214671

All of them, in my experience.

driscoll42 · 2025-07-22T20:06:03 1753214763

Fair, looking at the ASR leaderboards it is truly better - https://huggingface.co/spaces/hf-audio/open_asr_leaderboard and NVIDIA's Canary might be even better? Will try these out. Appreciate bringing these to my attention!

driscoll42 · 2025-07-20T14:27:29 1753021649

To suggest another "simple" example, Air Conditioning. It made half the world vastly more livable, and now anywhere in the world you could work every day of the year, reduced deaths and disease. At least currently, AC has had a greater impact on humanity than AI has.

driscoll42 · 2025-07-19T13:41:49 1752932509

It's not quite that specific, but close enough:

https://www.nytimes.com/2022/08/01/business/dealbook/pornhub...

https://arstechnica.com/tech-policy/2022/08/california-court...

>This week, US District Judge Cormac Carney of the US District Court of the Central District of California decided that there's reason to believe that Visa knowingly processed payments that allowed MindGeek to monetize "a substantial amount of child porn." To decide, the court wants to know much more about Visa's involvement, calling for more evidence of legal harms caused during a jurisdictional discovery process extended through December 30, 2022.

According to Court Listener, the case is still ongoing - https://www.courtlistener.com/docket/59992265/serena-fleites...

driscoll42 · 2025-02-14T13:59:24 1739541564

So, I did some OCR research early last year, that didn't include any VLMs, on some 1960s era English scanned documents with a mix of typed and handwritten (about 80/20), and here's what I found (in terms of cosine similarity):

                  Overall | Handwritten | Typed
  Google Vision:    98.80%  | 93.29%      | 99.37%
  Amazon Texttract: 98.80%  | 95.37%      | 99.15%
  surya:            97.41%  | 87.16%      | 98.48%
  azure:            96.09%  | 92.83%      | 96.46%
  trocr:            95.92%  | 79.04%      | 97.65%
  paddleocr:        92.96%  | 52.16%      | 97.23%
  tesseract:        92.38%  | 42.56%      | 97.59%
  nougat:           92.37%  | 89.25%      | 92.77%
  easy_ocr:         89.91%  | 35.13%      | 95.62%
  keras_ocr:        89.7%   | 41.34%      | 94.71%

Handwritten is a weighted average of Handwritten and typed, I also did Jaccard and Levenshtein distance, but the results were similar enough that just leaving them out for sake of space.

Overall, of you want the best, if you're an enterprise, just use whatever AWS/GCP/Azure you're on, if you're an individual, pick between those. While some of the Open Source solutions do quite well, surya took 188 seconds to process 88 pages on my RTX 3080, while the cloud ones were a few seconds to upload the docs and download them all. But if you do want open source, seriously consider surya, tesseract, and nougat depending on your needs. Surya is the best overall, while nougat was pretty good at handwriting. Tesseract is just blazingly fast, from 121-200 seconds depending on using the tessdata-fast or best, but that's CPU based and it's trivially parallelizeable, and on my 5950X using all the cores, took only 10 seconds to run through all 88 pages.

But really, you need to generate some of your own sample test data/examples and run them through the models to see what's best. Given frankly how little this paper tested, I really should redo my study, add VLMs, and write a small blog/paper, been meaning to for years now.

pqdbr · 2025-02-14T18:05:19 1739556319

Ive been looking for handwritten benchmarks for a while and would love to read that blog post.

driscoll42 · 2025-01-28T22:12:06 1738102326

RetroMags - hhttps://www.retromags.com/ has 5218 various gaming magazine issues and strategy guides one can download to check out! Looks like the VGHF and RetroMags are working together from forum posts, with the VGHF doing a lot of work on making them more accessible than a raw cbz/pdf download.

driscoll42 · on Dec 15, 2024

This, if your workout plan for the day has you running is six miles a day, rather than just running the same path over and over again, might as well have fun with it and add a bit more fun to your workout.