Sounds like the awesome architecture for transformers would be colocation of mem...

Joker_vD · on Oct 9, 2024

Yes, that's why we generally run them on GPUs.

phkahler · on Oct 9, 2024

That's why we need a row of ALUs in RAM chips. Read a row of DRAM and use it in a vector operation. With the speed of row reading, the ALU could take many cycles per operation to limit area.

namibj · on Oct 9, 2024

The big problem is that DRAM is extremely secretive about their processes, and they largely don't do that well for logic.

moffkalast · on Oct 9, 2024

GPUs that pull a kilowatt when running yes. This might actually work on an FPGA if the addition doesn't take too many clock cycles compared to matmuls which were too slow.

api · on Oct 9, 2024

GPUs are better but I'm thinking of even tighter coupling, like an integrated architecture.