Hacker Newsnew | past | comments | ask | show | jobs | submit | TOMDM's favoriteslogin

Re: yolo mode

I looked into docker and then realized the problem I'm actually trying to solve was solved in like 1970 with users and permissions.

I just made a agent user limited to its own home folder, and added my user to its group. Then I run Claude code etc as the agent user.

So it can only read write /home/agent, and it cannot read or write my files.

I add myself to agent group so I can read/write the agent files.

I run into permission issues sometimes but, it's pretty smooth for the most part.

Oh also I gave it root to a $3 VPS. It's so nice having a sysadmin! :) That part definitely feels a bit deviant though!


For some comparison, I recently did an OCR comparison for some work for a professor. To set some context, all documents were 1960s era typed or handwritten documents in English, specifically from this archive - http://allenarchive.iac.gatech.edu/. I hand transcribed 50 documents to use as a base comparison and ran them through the various OCR engines getting the results below.

                           Overall       Typed  Handwritten
  OCR Engine          Leven   Cosine  Leven   Cosine  Leven   Cosine
  Amazon Textract     91.63%  98.14%  92.07%  98.76%  87.99%  92.10%
  Google Vision       93.05%  97.97%  93.84%  98.99%  85.86%  88.11%
  Microsoft Azure     80.32%  95.61%  80.65%  96.20%  79.14%  90.21%
  TrOCR               78.66%  93.97%  80.64%  96.65%  59.96%  67.89%
  PaddleOCR           84.82%  90.73%  88.60%  96.28%  49.64%  37.58%
  Tesseract           86.67%  89.53%  91.14%  95.63%  44.54%  31.39%
  Easy OCR            81.79%  85.07%  85.50%  91.89%  46.87%  19.23%
  Keras OCR           58.03%  83.57%  59.32%  89.98%  46.08%  21.20%
Leven is Levenshtein Distance. Overall is a weighted average of typed vs handwritten, 90/10 if I recall correctly. All results were run on my personal machine with a 5950X, 128 GB RAM, and a RTX 3080.

From my analysis the Amazon Textract was excellent, the best of all the paid ones, and while TrOCR and PaddleOCR were the best FOSS ones, the issue with them is that they require a GPU while Tesseract I could use on CPU alone. For instance to OCR all 50 documents.

  Tessearct       1:19
  TrOCR (GPU)    27:33
  TrOCR (CPU)  3:04:22
TrOCR is great if you need to do a few or have GPUs to burn, but Tesseract is by far better if you need good enough for a large volume of documents, and for my project the intent was to make a software plugin that could be sent to libraries/universities, CPU is king.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: