This screams "Not Invented Here" syndrome. Massive yikes at the digram showing T...

kardianos · on Aug 28, 2024

This is a bad take. Here's my take: 1. Standard IP and Ethernet are physically acceptable for their use case. 2. TCP/IP is optimized in a number of areas for un-reliable networks. 3. Their clusters are not unreliable. 4. Their servers already offer hardware acceleration that can be programmed, so remove aspects of TCP/IP that increase latency or might negatively affect throughput. 5. They get to continue to purchase the cheaper IP switches and retain their existing hardware without retooling everything.

As an afterthought, they publish this for marketing/engineering pull for people who like to optimize (do engineering) for specific situations, while supporting the ROI and keeping cost down.

chipdart · on Aug 31, 2024

> This is a bad take. Here's my take: (...) 5. They get to continue to purchase the cheaper IP switches and retain their existing hardware without retooling everything.

OP's TCP offload engines do not require any retooling at all, if you consider stuff like buying IP switches not retooling. All you need to do is basically buy a network card.

Also, if you read the article you'll eventually stumble upon the bar chart where they claim that the one-way write latency of TTPoE is only 0.7E-6 seconds faster than existing tech like infiniband, and it's also the only somewhat hard data they show. Does that justify the investment of developing their whole software+hardware stack?

I'm sure the project was fun and should look great on a CV, but overall it doesn't look like it passes the smell test.

lallysingh · on Aug 28, 2024

They don't need the generality of full TCP for their cluster, so they're using a tweaked, incompatible subset. One that's been optimized for better performance on cheaper hardware than you can get with TCP h/w offload. In the offload case you're still paying the latency, wire protocol overhead, and efficiency costs of full TCP.

(Disclaimer: I work at Tesla, not related to this group, opinions on public info only)

throw0101d · on Aug 28, 2024

> One that's been optimized for better performance on cheaper hardware than you can get with TCP h/w offload.

How many ≥10 Gbps chipsets that you'd find in a typical server do not have offload nowadays?

Further, once you're in the ≥50 Gbps card range you can often get ROCE, which helps with things like latency.

lallysingh · on Aug 28, 2024

And you're still paying the other performance and efficiency costs of TCP. ROCE also isn't a magic bullet.

Every system has a cost and tradeoffs. Just because someone took an unusual path doesn't mean that they were wrong. And the larger and more specialized their use case is, the less likely that a generic solution is the best match.

chipdart · on Aug 31, 2024

> And you're still paying the other performance and efficiency costs of TCP. ROCE also isn't a magic bullet.

Tesla's own charts show ROCE achieving also one-way write latencies in the single-digit microsecond range. If that doesn't qualify as magic bullet, what does that say about TTPoE?

grumpy_coder · on Aug 28, 2024

"Instead of using typical supercomputer networking solutions like Infiniband, Tesla chose to adapt Ethernet to their needs with a modified transport layer."

So, they need to compare it to Infiniband, not TCP, and definitely not software TCP. And they need to explain how/if it works with standard huge capacity switches (which is at least a reason to prefer TCP over Infiniband).

There could be reasons to build this, AWS have something, but for Tesla to build their own stinks of NIH bad.

ai4ever · on Aug 28, 2024

agree 100% with this take.

they are purpose building hardware for their specific application. debugging corner cases and making this robust is going to take them a decade. given that nobody else is interested in this non-standard solution, they dont have the benefit of the community debugging it, and improving on it in open-source.

appears to me to be a vanity effort as is the whole dojo project.

MisterTea · on Aug 28, 2024

I assume they ignore other technologies and research because new shiny things gives them visibility and therefor promotions.

lnsru · on Aug 28, 2024

I have built dozen of different FPGA based cameras in the past. There is GigE vision protocol: https://en.m.wikipedia.org/wiki/GigE_Vision TCP is used for “normal” connection and UDP for low latency video data. Such system could be used for other low latency applications as well.

MisterTea · on Aug 28, 2024

I do industrial controls so very familiar. IP is a lot of overhead that doesn't really do anything for the user in a tightly defined automation network local to a machine. EtherCAT goes a step lower and drops IP in favor of just sending Ethernet frames of type 0x88A4. It uses a unique ring topology. It does not use traditional switching or repeaters with the IO devices containing a special controller called the ESC, the EtherCAT Subordinate Controller. The master only needs a standard Ethernet controller. You can get cycle times in the 10's of microseconds allowing for up to 50khz update rates on IO devices. This allows you to do do high performance servo motor control where you close the current loop in the master CPU over 100mbit Ethernet and easily reach 10+ kHz update rates.

With FPGAs using commodity SFP or Ethernet PHY's you can certainly build stuff that runs circles around traditional Ethernet and associated overhead from protocols like IP.