I think the most likely bottleneck is gonna be your NIC hating getting a ton of packets. Line rate with huge frames is quite different than line rate with just ICMP packets, for instance (see CME binary glink market data for a similarly stressful experience to the ICMP).
There's probably going to be some overhead, but it seems like you could do 1M, if you have the bandwidth.