Defense Against AI-Guided Traffic Analysis (Daita)

vouaobrasil · on May 8, 2024

I feel like modern technological development has become a comic tragedy: we develop AI so we can develop AI tools to hack so we can develop more AI to combat that AI.

Seems to me like a self-perpetuating broken-window fallacy taken to its logical extreme.

renegade-otter · on May 8, 2024

This is why the results of AI may be a wash - at best. Oh, your bank is using AI-based fraud detection? The attackers are using AI as well.

https://renegadeotter.com/2024/04/22/artificial-intelligence...

bonton89 · on May 8, 2024

There's usually an attack defense asymmetry of effort on these things (attackers only need to win once whereas defenders need to always be right) so I can't really see it being anything but a net loss. The fraud detection will also increase false positives because those are never zero.

wheelerwj · on May 8, 2024

I think this is just called progress or maybe evolution Im not sure we should relate this technology advancement to economic theory. But I do see a direct comparison to other things such as antibiotics.

We know that taking antibiotics leads to stronger bacteria. But we still should have developed it and incorporated it into our medicine. AI will do some amazing things, and as a tool it will be used by some bad people to do bad things. But overall society improves/grows with the creation and use of tools.

vouaobrasil · on May 8, 2024

Well, technological advancement changes the nature of our economic system and technological advancement is technologically motivated because of the differential "survival" of ideas and technology based on how they help users further themselves economically. So I think it's very pertinent to the discussion.

api · on May 8, 2024

https://m.youtube.com/watch?v=ypEaGQb6dJk&pp=ygUSMjAwMSBvcGV...

thesnide · on May 8, 2024

Self fullfilling prophecy and arms race comes to my mind

fullspectrumdev · on May 8, 2024

Ok now this is quite interesting, particularly when it comes to “messing with” netflow analysis, which is what’s referred to (the data provided by Team Cymru to the FBI, etc).

It’s certainly looks to be a step above prior efforts to inject noise that I’ve seen such as having an instrumented browser “randomly browse the web” to add noise.

RonMarken · on May 9, 2024

This is all very interesting. No disrespect to Mullvad but will there be any effort towards attempting to get such functionality standardized? In the past there was a XOR scrambling patch for OpenVPN[1] to attempt to circumvent certain government firewalls and the OpenVPN developers had concerns about unaudited changes to the wire protocol.

[1]: https://proprivacy.com/vpn/guides/openvpn-scramble-xor-obfus...

paranoidrobot · on May 8, 2024

I'm sure it's not that new, but first saw this kind of anti-analysis behaviour in action in Nullsoft's WASTE (the Winamp guys).

Unfortunately with symmetrical consumer internet services still being rare in many countries, fully masking your traffic when trying to download a large amount of content is made more difficult or time consuming.

[1] https://en.m.wikipedia.org/wiki/WASTE

shaky-carrousel · on May 8, 2024

Mullvad fake traffic is going to be lightweight, because they'll won't mess with metered connections. It'll be relatively easy to tell the fake traffic from the real one over time.

tetris11 · on May 8, 2024

By "AI", surely they just mean traditional machine learning?

Or are people training neural nets on traffic analysis now?

pulls · on May 8, 2024

Nope, "AI". The academic community working on one very active research area of traffic analysis, called website fingerprinting, made significant leaps with NNs over traditional ML in 2018: "Deep Fingerprinting: Undermining Website Fingerprinting Defenses with Deep Learning", by Sirinam et al., https://dl.acm.org/doi/pdf/10.1145/3243734.3243768 . Since then, the most powerful attacks here have used deep learning.

Broadly speaking, traffic analysis benefits a lot from work in computer vision: view a network trace as a one-dimensional picture. There are some fun visualizations here if you scroll down a bit: https://github.com/pylls/padding-machines-for-tor/tree/maste... . Every 1-pixel high line is a website visit, where each pixel corresponds to a packet (or cell in Tor) sent or received.

nusl · on May 8, 2024

Does this have any performance impact with regards to throughput or latency?

pulls · on May 8, 2024

Yes, it's significant. Unfortunately, there are fundamental trade-offs here between protection and bandwidth and/or latency. Another aspect is energy: keeping a connection "alive" by regularly ensuring traffic on a connection does not help battery life. We have much to optimize here.

(Disclosure: I work with Mullvad on DAITA.)

benoliver999 · on May 8, 2024

So, the AI analysis uses a huge amount of energy, and DAITA also uses surplus energy. Quite a sad time we are heading into :(

pulls · on May 8, 2024

Yeah :(

It's similar to how encryption was viewed as too expensive a decade or two ago. Today, it is a necessity. Seeing how available bandwidth keeps growing to accommodate things like video, I hope traffic analysis defenses won't be as detrimental in the long run for most internet use.

ComodoHacker · on May 8, 2024

Could random packet delays or delay equalization help here (instead of additional packets)?

pulls · on May 8, 2024

Yes, for sure. As a defender, you have two main tools: dummy packets (bandwidth) and delaying packets (latency). Padding-only defenses will indirectly delay normal (non-padding) packets by filling the connection with padding. You want to explicitly block outgoing traffic and try to account for congestion to minimize wasted bandwidth.

This is tricky. We have hardly started dealing with traffic analysis issues in protocols. In general, we have spent the last decade+ getting encryption sort of right with amazing efforts like TLS 1.3 and WireGuard, etc. Expect another decade for traffic analysis.