Does Signal protect from the scheme when the government sends discovery requests for all existing phone numbers (< 1B) and gets a full mapping between user id and phone number?
While slightly unrelated, I thought, how we can fix this for truly secure and privacy-aware, non-commercial communication platforms like Matrix? Make it impossible to build such mapping. The core idea is that you should be able to find the user by number only if you are in their contact list - strangers not welcome. So every user, who wishes to be discovered, uploads hash(A, B) for every contact - a hash of user's phone number (A) and contact's phone number (B), swapped if B < A. Let's say user A uploaded hashes h(A,B) and h(A,C). Now, user B wishes to discover contacts and uploads hashes h(A, B) and h(B, D). The server sees matching hashes between A and B and lets them discover each other without knowing their numbers.
The advantages:
- as we hash a pair of 9-digit numbers, the hash function domain space is larger and it is more difficult to reverse the hashes (hash of a single phone number is reversed easily)
- each user can decide who may discover them
Disadvantages:
- a patient attacker can create hashes of A with all existing numbers and discover who are the contacts of A. Basically, extract anyone's phone book via discovery API. One way to protect against this would be to verify A's phone number before using discovery, but the government, probably, can intercept SMS codes and pass the verification anyway. However, the government can also see all the phone calls, so they know who is in whose phone book anyway.
- if the hash is reversed, you get pairs of phone numbers instead of just one number
Yeah you're doing a lot better job on the privacy side than signal is IMO.
Especially just being able to run my own service will be priceless when something like chatcontrol eventually makes it through. Signal can only comply or leave, but they'll never manage to kill all the matrix servers around.
Signal publicly shares government requests AND the data that they send them
The data Signal has is: 1) registration time for a given phone number, 2) knowledge of daily login (24hr resolution). That's it. That's the metadata.
They do not have information on who is communicating with who, when messages are sent, if messages are sent, how many, the size, or any of that. Importantly, they do not have an identity (your name) associated with the account nor does that show for contacts (not even the phone number needs be shared).
Signal is designed to be safe from Signal itself.
Yes, it sucks that there is the phone number connected to the account, but you can probably understand that there's a reason authorities don't frequently send Signal data requests; because the information isn't very useful. So even if you have a phone number associated with a government ID (not required in America) they really can only show that you have an account and potentially that the account is active.
Like the sibling comment says, there's always a trade-off. You can't have a system that has no metadata, but you can have one that minimizes it. Signal needs to balance usability and minimize bots while maximizing privacy and security. Phone numbers are a barrier to entry for bots, preventing unlimited or trivial account generation. It has downsides but upsides too. One big upside is that if Signal gets compromised then there's can be no reconstruction of the chat history or metadata. IMO, it's a good enough solution for 99.9% of people. If you need privacy and security from nation state actors who are directly targeting you then it's maybe not the best solution (at least not out of the box) but otherwise I can't see a situation where it is a problem.
FWIW, Signal does look to be moving away from phone numbers. They have usernames now. I'd expect it to take time to completely get away though considering they're a small team and need to move from the existing infrastructure to that new one. It's definitely not an easy task (and I think people frequently underestimate the difficulty of security, as quoted in the article lol. And as suggested by the op: it's all edge cases)
> Phone numbers are a barrier to entry for bots, preventing unlimited or trivial account generation.
What's wrong with account generation? Nothing. The problem is if they start sending spam to random people. So we can make registration or adding contacts paid (in cryptocurrency) and the problem is gone.
>So we can make registration or adding contacts paid (in cryptocurrency) and the problem is gone.
The majority of the user base would be gone, too.
I had a hard enough time convincing my friend group to use Signal as is. If they had to pay (especially if it had to be via cryptocurrency) none of them would have ever even considered it.
I would rather pay $1 than with my phone number which is much much much more valuable. Telegram did an experiment with paid anonymous registration, but the prices were ridiculous and targeted for the riches.
What's right with it? Accounts being generated (i.e. many inauthentic accounts controlled by few people) are always used to send spam, there are no exceptions. The perpetrators should be in prison.
Ah yes, and convincing friends/family/partners to use Signal instead of Whatsapp clearly what will convince them is that they need to setup, acquire, and use cryptocurrency to register or connect with me on the encrypted messaging service. "No thanks, I just use Whatsapp/iMessage. I heard they're actually e2e encrypted too, so what's the problem?"
> Does Signal protect from the scheme when the government sends discovery requests for all existing phone numbers (< 1B) and gets a full mapping between user id and phone number?
Signal does have the phone numbers, as you say. Can they connect a number to a username?
>>> Does Signal protect from the scheme when the government sends discovery requests for all existing phone numbers (< 1B) and gets a full mapping between user id and phone number?
Which yes, this does protect that. There is no mapping between a user id and phone number. Go look at the reports. They only show that the phone number has a registered account but they do not show what the user id is. Signal doesn't have that information to give.
> Can they connect a number to a username?
From Signal
Usernames in Signal are protected using a custom Ristretto 25519 hashing algorithm and zero-knowledge proofs. Signal can’t easily see or produce the username if given the phone number of a Signal account. Note that if provided with the plaintext of a username known to be in use, Signal can connect that username to the Signal account that the username is currently associated with. However, once a username has been changed or deleted, it can no longer be associated with a Signal account.
This is in the details on[0] right above the section "Set it, share it, change it"
So Signal cannot use phone numbers to identify usernames BUT Signal can use usernames to identify phone numbers IF AND ONLY IF that username is in active use. (Note that the usernames is not the Signal ID)
If you are worried about this issue I'd either disable usernames or continually rotate them. If the username is not connected with your account at the time the request is being made then no connection can be made by Signal. So this is pretty easy to thwart, though I wish Signal included a way to automate this (perhaps Molly has a way or someone can add it?) Either rotating after every use or on a timer would almost guarantee that this happens given that it takes time to get a search warrant and time for Signal to process them. You can see from the BigBrother link that Signal is not very quick to respond...
While slightly unrelated, I thought, how we can fix this for truly secure and privacy-aware, non-commercial communication platforms like Matrix? Make it impossible to build such mapping. The core idea is that you should be able to find the user by number only if you are in their contact list - strangers not welcome. So every user, who wishes to be discovered, uploads hash(A, B) for every contact - a hash of user's phone number (A) and contact's phone number (B), swapped if B < A. Let's say user A uploaded hashes h(A,B) and h(A,C). Now, user B wishes to discover contacts and uploads hashes h(A, B) and h(B, D). The server sees matching hashes between A and B and lets them discover each other without knowing their numbers.
The advantages:
- as we hash a pair of 9-digit numbers, the hash function domain space is larger and it is more difficult to reverse the hashes (hash of a single phone number is reversed easily)
- each user can decide who may discover them
Disadvantages:
- a patient attacker can create hashes of A with all existing numbers and discover who are the contacts of A. Basically, extract anyone's phone book via discovery API. One way to protect against this would be to verify A's phone number before using discovery, but the government, probably, can intercept SMS codes and pass the verification anyway. However, the government can also see all the phone calls, so they know who is in whose phone book anyway.
- if the hash is reversed, you get pairs of phone numbers instead of just one number