The Google speech API supports Speaker Diarization and Android supports voice unlock so they are getting to a point where they are able to ID voices and should be able to tell which voice has consented to being recorded.
But doesn't it have to record the voice to be able to identify it? Once the sound waves are converted to a format that the machine can process, it has been recorded.
To my knowledge, we've generally recognized a difference between recording and immediate processing. Does a voice changer "record" you, if it adjusts the audio on the fly and doesn't store anything? VoIP phones arguably "record" the audio from your microphone and transfer it over the line to output to another party, but we don't call that recording either.
I think the biggest issue here is how smart speakers work: They record your audio and then send it to be permanently stored on Google, Amazon, or Apple's servers, rather than being processed locally and discarded, which we have the technology to do just fine today.
The only reason we're retaining voice recordings is to provide valuable data to the companies in question.
Are you really trying to argue over the definition of the term?
Indeed you can discard unstored data.
I write data pipelines for a living. We often use "discard" as a term for parts of data on which no further processing is performed and are not stored.
It's an extremely common usage of the term. See [1] for example.