What is a recording? Does a 10 millisecond delay mean it is a recording? What is the legally binding lock-out time for this audio stream before a human can listen to it?
A recording is end pointed audio captured and stored on a server and reviewed later.
There is no legally binding lock out time because there doesn’t need to be. Live-listening is impractical at scale and also worthless for what the article is describing they do with the data.
Remember, this work is being done to make it so humans don’t have to be in the loop.
Why is live-listening impractical at scale? How can Discord provide such a service? You seem to be conflating Amazon's intent with the unknowable intent of anyone that can touch such data, which is a common fatal flaw in reasoning.
If I worked on voice search I would wish to have full context to train my models on. Full context could be year long. Ethics are an issue here for sure.