Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sounds like the awesome architecture for transformers would be colocation of memory and compute.


Yes, that's why we generally run them on GPUs.


That's why we need a row of ALUs in RAM chips. Read a row of DRAM and use it in a vector operation. With the speed of row reading, the ALU could take many cycles per operation to limit area.


The big problem is that DRAM is extremely secretive about their processes, and they largely don't do that well for logic.


GPUs that pull a kilowatt when running yes. This might actually work on an FPGA if the addition doesn't take too many clock cycles compared to matmuls which were too slow.


GPUs are better but I'm thinking of even tighter coupling, like an integrated architecture.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: