It doesn't need to write tests: it can just use the application and figure out i...

logicchains · on Jan 29, 2025

That's going to be much slower and more expensive than writing tests because image/video processing is slower and more expensive than writing tests. And because of lag in using the UI (and re-building the whole application from scratch after every change to test again).

drdeca · on Jan 29, 2025

Hm, what if instead of using video of the application…

Ok, so if one can have one program snoop on all the rendering calls made by another program, maybe there could be a way of training a common representation of “an image of an application” and “the rendering calls that are made when producing a frame of the display for the application”? Hopefully in a way that would be significantly smaller than the full image data.

If so, maybe rather than feeding in the video of the application, said representation could be applied to the rendering calls the application makes each frame, and this representation would be given as input as the model interacts with the application, rather than giving it the actual graphics?

But maybe this idea wouldn’t work at all, idk.

Like, I guess the rendering calls often involve image data in their arguments, and, you wouldn’t want to include the same images many time as the input to the encoding thing, as that would probably (or, I imagine) make it slower than just using the overall image of the application. I guess the calls are probably more pointing to the images in memory though, not putting an entire image on the stack.

I don’t know enough about low-level graphics programming to know if this idea of mine makes any sense.

achierius · on Jan 30, 2025

Yes, it would be significantly smaller, but it would look very different depending on your platform, GPU, driver version, etc. -- the model would essentially need to learn how to map "graphics APIs" (e.g. OpenGL, Vulkan, Metal, ...) to "render result" for every combination of API, driver version, and GPU, which I imagine would constitute a significant amount of overhead.

ClumsyPilot · on Jan 29, 2025

But it’s actually correct from a usability perspective