WebRTC makes it possible by timing is still limited to NTP who has is far above one sample, you couldn’t possibly get sample accurate playback but you could get it to within a mS or so.
Depending on how far away your sources are that might be fine, for instance two speakers in two rooms where you won’t get significant phase issues this is trivial to do(well trivial is an understatement but you can do it purely with web technologies).
Say more?