interesting, so you think the issue with the above approach is the graph structure being too rigid / lossy (in terms of losing semantics)? And embeddings are also too lossy (in terms of losing context and structure)? But you guys are working on something less lossy for both semantics and context?
> interesting, so you think the issue with the above approach is the graph structure being too rigid / lossy (in terms of losing semantics)?
Yeah, exactly.
>And embeddings are also too lossy (in terms of losing context and structure)
Interestingly, it appears that the problem is not embeddings but rather retrieval. It appears that embeddings can contain a lot more information than we're currently able to pull out. Like, obviously they are lossy, but... less than maybe I thought before I started this project? Or at least can be made to be that way?
> But you guys are working on something less lossy for both semantics and context?
Yes! :) We're getting there! It's currently at the good-but-not-great like GPT-2ish kind of stage. It's a model-toddler - it can't get a job yet, but it's already doing pretty interesting stuff (i.e. it does much better than SOTA on some complex tasks). I feel pretty optimistic that we're going to be able to get it to work at a usable commercial level for at least some verticals — maybe at an alpha/design partner level — before the end of the year. We'll definitely launch the semantic part before the context part, so this probably means things like people search etc. first — and then the contextual chunking for big docs for legal etc... ideally sometime next year?
Buried, but on Page 24 they reveal to me the most surprising massive capability leap - that o3-mini is way better at conning gpt-4o for money (79% win rate for o3-mini vs 27% for full o1!). It isn't surprising to me that "reasoning" can lead to improvements in modeling another LLM, but definitely makes me wary for future persuasive abilities on humans as well.
The voice is just OpenAI’s default tts voice. I agree that Veritasium video is an incredible work and the ai version is absurd by comparison!
This is mostly a proof of concept that this is possible at all, and as LLMs get smarter it’ll be interesting to see if the quality automatically improves. For now, the tool is really only useful for very specific or personal questions that wouldn’t already exist on YouTube.
Hmm the initial version of the app only took me about a day to get something working, but that version took minutes to generate a single video and even then only worked a third of the time. It took a solid 2 weeks from there to add all the edge cases to the prompt to increase reliability, add GPU rendering and streaming to improve performance/latency, and shore up the infra for scaling.
totally fair! I like the XKCD comic as well because it hints at a potential solution - even if you can't always be correct, how you respond to critical questions can really help. I'm working on a feature for users to ask follow up questions and definitely going to consider how to make it most honest and curious
These are amazing examples! Thanks for all the feedback, detailed info, and persistence in trying! HN hug of death means I'm running into Gemini rate limits unfortunately :( will def make that more clear when it happens in the UI and try to find some workarounds.
The other issues are bugs with my streaming logic retrying clips which failed to generate. LLMs aren't yet perfect at writing Manim, so to keep things smooth I try to skip clips which fail to render properly. Still also have layout issues which are hard to automatically detect.
I expect with a few more generations of LLM updates, prompt iterating, and better streaming/retrying logic on my end this will become more reliable
There is a job queue on the backend with statuses, just not worth breaking the streaming experience to ask the LLM rewrite broken manim segments out of order
reply