More

elyase · 2026-02-25T00:22:15 1771978935

there is also pz a drop-in replacement for pi rewritten in Zig. 1.7MB static binary, 3ms startup, 1.4MB RAM idle. Find more at:

https://github.com/elyase/awesome-personal-ai-assistants?tab...

_neil · 2026-02-25T02:27:01 1771986421

Direct link to pz for those on mobile: https://github.com/joelreymont/pz

snthpy · 2026-02-25T06:18:54 1772000334

Cool, thanks for this. What about the extensions though? For me the point about pi is minimal base plus configurable extensions you choose.

mccoyb · 2026-02-25T11:36:54 1772019414

Written by a person who is infamously annoying open source maintainers with AI slop PRs (see the DWARF debacle in OCaml) … and missing much of pi’s philosophy

Pass for me.

elyase · 2025-07-21T07:55:29 1753084529

https://github.com/sst/opencode

elyase · 2025-07-03T14:01:19 1751551279

This is similar to the tool call (fixed code & dynamic params) vs code generation (dynamic code & dynamic params) discussion: tools offer contrains and save tokens, code gives you flexibility. Some papers suggest that generating code is often superior and this will likely become even more true as language models improve

[1] https://huggingface.co/papers/2402.01030

[2] https://huggingface.co/papers/2401.00812

[3] https://huggingface.co/papers/2411.01747

I am working on a model that goes a step beyond and even makes the distinction between thinking and code execution unnecessary (it is all computation in the end), unfortunately no link to share yet

elyase · 2025-06-16T15:49:46 1750088986

Between two models the one with the shorter Minimum Description Length (MDL) will more likely generalize better

elyase · 2025-05-17T13:33:33 1747488813

What you're missing is how to use the tools properly. With solid documentation, good project management practices, a well-organized code structure and tests, any junior engineer should be able to read up on your codebase, write linted code following your codebase style, verify it via tests and write you a report of what was done, challenges faced etc. State of the art coding agents will do that at superhuman speeds.

If you haven't set things up properly (important info lives only in people’s heads / meetings, tasks dont have clear acceptance criteria, ...) then you aren't ready for Junior Developers yet. You need to wait until your Coding Agents are at Senior level.

elyase · on Feb 26, 2025

I want to use browser-use in Cursor but I am using another option because it doesn't support MCP integration which is the common language they support for external tools

elyase · on Jan 14, 2025

There is more information in these twitter threads:

https://x.com/karinanguyen_/status/1879270529066262733 https://x.com/OpenAI/status/1879267276291203329

elyase · on Oct 25, 2023

How does it compare to https://github.com/explodinggradients/ragas

joewferrara · on Oct 25, 2023

tvalmetrics is similar to ragas for sure, and we really like ragas. tvalmetrics has structural differences as well as differences in the specific metrics when compared to ragas.

With regards to the metrics, we have an end to end metric that scores how well your RAG response matches a reference correct response called answer similarity score. Last I saw, ragas did not have a score like this as they focus on scoring RAG responses and context without a reference correct answer. We also have a retrieval k-recall score that involves comparing the relevance of the retrieved context to the relevance of the top k context chunks where k is larger than the number of retrieved context chunks. Retrieval k-recall is a good score for tuning how many context chunks your RAG system should retrieve. I do not believe ragas has a score like this.

Structurally, tvalmetrics does not use langchain, while ragas does use langchain. We chose not to use langchain for our LLM API calls to keep the code in the package clear and make it easy for a user to understand exactly how the LLM-assisted evaluation works for each metric. Of course, the drawback of not using langchain means that our package is not integrated with as many LLMs. Currently, we only support using Open AI chat completion models as LLM evaluators to calcaulate the metrics. We plan on adding support for additional LLM evaluators very soon.

elyase · on Jan 2, 2019

The closest alternatives in this space would be allennlp [1], the recently released pytext [2] and spacy [3]. pytext's authors wrote some comparison on the accompanying paper [4] and this GitHub issue [5].

[1] https://github.com/allenai/allennlp

[2] https://github.com/facebookresearch/pytext

[3] https://spacy.io

[4] https://arxiv.org/pdf/1812.08729.pdf

[5] https://github.com/facebookresearch/pytext/issues/110

mkl · on Jan 2, 2019

Do you know if any of these can be used for text prediction? (I.e. guessing what the next word/token will be.)

yorwba · on Jan 2, 2019

Text prediction is usually called "language modeling" in NLP. Because it's useful as a weak supervision signal to improve performance on other tasks, most of the mentioned libraries support it. However, they might not always provide complete examples, instead assuming that you know how to express the model and train it using the primitives provided by the library.

Flair: https://github.com/zalandoresearch/flair/blob/master/flair/m...

Allen NLP: https://github.com/allenai/allennlp/blob/master/allennlp/dat...

PyText: https://github.com/facebookresearch/pytext/blob/master/pytex...

spaCy seems to focus on language analysis and I couldn't find an API that'd be directly usable for text generation.

plagtag · on Jan 3, 2019

Flair looks really promising to me!

be_erik · on Jan 2, 2019

Markov chains can be used to do type ahead prediction. It's likely what the iOS uses for their predictive keyboard.

https://en.wikipedia.org/wiki/Markov_chain

mkl · on Jan 2, 2019

Yes, there are plenty of methods, and I have a couple implemented, but an off-the-shelf one from a cutting edge library would likely be better.

edraferi · on Jan 2, 2019

It's gonna be hard to get an "off the shelf" model for text prediction, because the upcoming text depends on the author, topic, and other context. You can probably find some decent pre-trained models to get started, but you'll need to customize them for your application to get good results.

mkl · on Jan 2, 2019

Right, I was thinking off-the-shelf in the sense of giving it a tokenised corpus and it does the rest, or it incorporates that into its existing model. Dictation software, phone keyboards, etc. do this.

starchand · on Jan 2, 2019

Which method would work best for email classification into 1 of 7 categories? Problem I've seen is 1 or 2 key sentences within the email can classify the message but they are usually outnumbered by generic sentences such as signatures, greetings, headers/footers etc

sachin18590 · on Jan 3, 2019

These are all frameworks and while none of them have any signular advantage over other especially in the problem statement you are looking for, you should ideally be able to figure out what works best for you based on the classification sensitivity and training data you are working with. The problem in itself can be quite simple to extremely complex based on the above 2 factors. Spacy's pre-processing tools are quite easy to use and that combined with tool like talon should help you clean up the email correctly. Thereafter, if your email text is pretty much to the point, then any intent classification tool will work, however, if the email text is long and intents are spread across, then you will need a hierarchical layer to understand the intent hierarchies as well as an attention layer to understand which intents to focus and not lose track of in an email. At that time, you are quite far from using a generic plug and play framework and will need to exactly and quite thoroughly understand the deep learning models you are working with as well as the dataset you have and the classification you are trying to build.

starchand · on Jan 3, 2019

Thanks this is really helpful! I am using talon and sklearn as paragraph by paragraph intent classifier. I am classifying the whole email from the highest individual intent probability. This seems to be working well for my minimal test data (~200 sentences) but have yet to test in the wild. I will research hierarchical layer and attention layers.

wodenokoto · on Jan 2, 2019

Generally these generic sentences should be randomly distributed, and so their effect should be minimal.

You can randomly add them to your training set, if you feel that real world data has them randomly distributed, but you training sample is too small to capture this.

mongodude · on Jan 2, 2019

I like Flair for three reasons:

- Easy to use

- Developers are very active

- State-of-the-art results using approaches that are easy to understand and works well for most of text classification tasks

elyase · on Dec 27, 2018

We have done extensive testing in the context of chatbot intent classification and in our particular problem nothing (including CNN, LSTM, fasttext plus LUIS, Watson and other proprietary classifiers) has been able to beat a simple linear model trained on char n-gram features.

eggie5 · on Dec 27, 2018

I've seen a rule of thumb that if the ratio of samples to words per sample is less than 1500 you prob don't have enough data for embeddings/cnn

minimaxir · on Dec 27, 2018

Chatbot intent would be a good use case for a linear model, as a single word/ngram would have a high impact on the result (in contrast to advanced architectures which try to account for ambiguity/contradictions in documents)

briga · on Dec 27, 2018

I've seen the same things in the models I've built. For basic intent classification simpler models seem to be more accurate, not to mention they train faster and require less memory. There seems to be a lot of emphasis on shiny complex neural network architectures, even when simple models work just fine.

FridgeSeal · on Dec 27, 2018

> There seems to be a lot of emphasis on shiny complex neural network architectures, even when simple models work just fine.

It's resume-driven-development for data scientists.

I've never seen an interviewer impressed with the fact that a job was performed using not-deep learning, but say that you used deep learning (despite how spurious it might be) and they light up like it's Christmas.

helpme3 · on Dec 28, 2018

This isn't that surprising. I think the reason for this is that, even though the model is linear, the space of n-grams is so large that there usually is a line that separates any two classes.

laichzeit0 · on Dec 27, 2018

Anecdotally, I found the same thing. N-gram BOW is surprisingly difficult to beat.

thanatropism · on Dec 27, 2018

I've been curious about what manifold learning (sp. now that we have UMAP) would do to that kind of workflow's performance.