> AP2 is designed as a universal protocol, providing security and trust for a variety of payments like stablecoins and cryptocurrencies. To accelerate support for the web3 ecosystem, in collaboration with Coinbase, Ethereum Foundation, MetaMask and other leading organizations, we have extended the core constructs of AP2 and launched the A2A x402 extension, a production-ready solution for agent-based crypto payments. Extensions like these will help shape the evolution of cryptocurrency integrations within the core AP2 protocol.
The only Beam-specific part are the sandboxes, but those can easily be swapped out for the vendor of your choice. The architecture we described isn't exclusive to our product.
(Disc: Googler but don't have any specific knowledge of this architecture)
My understanding of Groq is that the reason it is fast is that all the weights are kept in SRAM and since the SRAM <-> Compute bandwidth is much faster than HBM <-> Compute bandwidth, you can generate tokens faster (During generation the main bottleneck is just bringing in the weights + KV caches into compute).
If the diffusion models just do multiple unmasked forward passes through a transformer, then the activation * weights computation + (attention computation) will be the bottleneck which will make each denoising step compute bound and there won't be any advantage in storing the weights in SRAM since you can overlap the HBM -> compute transfer with compute itself.
But my knowledge of diffusion is non-existent, so take this with a truck of salt.