More

zan2434 · 2026-04-23T00:15:27 1776903327

I am unfortunately just paying for this out of pocket! Didn't really expect it to blow up like this.

prodigycorp · 2026-04-23T04:48:39 1776919719

Thank you for sharing this. Not often do you get projects that expand your imagination about what we can do with these models.

rogerrogerr · 2026-04-24T03:04:05 1776999845

Please write a blog post about the experience of making / hosting / paying for this. I’d love to hear about it.

endymion-light · 2026-04-23T08:21:37 1776932497

Thank you for this, i think it's fantastic! I've got some open source AR work i'm doing, it's given me some inspiration to build something using it!

henry2023 · 2026-04-23T13:53:00 1776952380

Amazing work, thanks for building this. Are you planing on open sourcing it? Do you have a way to support your work?

justinclift · 2026-04-23T11:14:49 1776942889

Are you running it from your own gpu(s), or paying for the gpu usage in a likely-eye-watering way? ;)

potamic · 2026-04-24T06:12:05 1777011125

What's the average cost per interaction?

zan2434 · 2025-08-29T17:12:58 1756487578

interesting, so you think the issue with the above approach is the graph structure being too rigid / lossy (in terms of losing semantics)? And embeddings are also too lossy (in terms of losing context and structure)? But you guys are working on something less lossy for both semantics and context?

joshua_s_penman · 2025-08-29T17:39:52 1756489192

> interesting, so you think the issue with the above approach is the graph structure being too rigid / lossy (in terms of losing semantics)?

Yeah, exactly.

>And embeddings are also too lossy (in terms of losing context and structure)

Interestingly, it appears that the problem is not embeddings but rather retrieval. It appears that embeddings can contain a lot more information than we're currently able to pull out. Like, obviously they are lossy, but... less than maybe I thought before I started this project? Or at least can be made to be that way?

> But you guys are working on something less lossy for both semantics and context?

Yes! :) We're getting there! It's currently at the good-but-not-great like GPT-2ish kind of stage. It's a model-toddler - it can't get a job yet, but it's already doing pretty interesting stuff (i.e. it does much better than SOTA on some complex tasks). I feel pretty optimistic that we're going to be able to get it to work at a usable commercial level for at least some verticals — maybe at an alpha/design partner level — before the end of the year. We'll definitely launch the semantic part before the context part, so this probably means things like people search etc. first — and then the contextual chunking for big docs for legal etc... ideally sometime next year?

zan2434 · on Jan 31, 2025

Buried, but on Page 24 they reveal to me the most surprising massive capability leap - that o3-mini is way better at conning gpt-4o for money (79% win rate for o3-mini vs 27% for full o1!). It isn't surprising to me that "reasoning" can lead to improvements in modeling another LLM, but definitely makes me wary for future persuasive abilities on humans as well.

zan2434 · on Jan 4, 2025

I was running into some scaling issues, but should be all working now!

zan2434 · on Jan 4, 2025

The voice is just OpenAI’s default tts voice. I agree that Veritasium video is an incredible work and the ai version is absurd by comparison! This is mostly a proof of concept that this is possible at all, and as LLMs get smarter it’ll be interesting to see if the quality automatically improves. For now, the tool is really only useful for very specific or personal questions that wouldn’t already exist on YouTube.

zan2434 · on Jan 4, 2025

Hmm the initial version of the app only took me about a day to get something working, but that version took minutes to generate a single video and even then only worked a third of the time. It took a solid 2 weeks from there to add all the edge cases to the prompt to increase reliability, add GPU rendering and streaming to improve performance/latency, and shore up the infra for scaling.

StefanBatory · on Jan 6, 2025

Thank you!

zan2434 · on Jan 4, 2025

totally fair! I like the XKCD comic as well because it hints at a potential solution - even if you can't always be correct, how you respond to critical questions can really help. I'm working on a feature for users to ask follow up questions and definitely going to consider how to make it most honest and curious

zan2434 · on Jan 4, 2025

These are amazing examples! Thanks for all the feedback, detailed info, and persistence in trying! HN hug of death means I'm running into Gemini rate limits unfortunately :( will def make that more clear when it happens in the UI and try to find some workarounds.

The other issues are bugs with my streaming logic retrying clips which failed to generate. LLMs aren't yet perfect at writing Manim, so to keep things smooth I try to skip clips which fail to render properly. Still also have layout issues which are hard to automatically detect.

I expect with a few more generations of LLM updates, prompt iterating, and better streaming/retrying logic on my end this will become more reliable

OutOfHere · on Jan 4, 2025

You didn't think to add a queue (with trackable status links)?

zan2434 · on Jan 4, 2025

There is a job queue on the backend with statuses, just not worth breaking the streaming experience to ask the LLM rewrite broken manim segments out of order

zan2434 · on Jan 4, 2025

sad looks like I already hit the Gemini rate limit :( Switching to Claude!

zan2434 · on Jan 3, 2025

thanks! Streaming was actually pretty hard to get working, but it goes roughly like this as a streaming pipeline:

- The LLM is prompted to generate an explainer video as sequence of small Manim scene segments with corresponding voiceovers

- LLM streams response token-by-token as Server-Sent-Events

- Whenever a complete Manim segment is finished, send it to Modal to start rendering

- Start streaming the rendered partial video files from manim as they are generated via HLS