20 years ago, it was possible to seamlessly merge video clips from multiple streaming RealPlayer servers into a single composite video stream, using a static XML text file (SMIL) distributed via HTTP, with optional HTML annotation and composition.
This is technically possible today but blocked by DRM and closed apps/players. Innovation would be unlocked if 3rd party apps could create custom viewing experiences based on licensed and purchased content files downloaded locally, e.g. in your local Apple media library. The closed apps could then sherlock/upstream UX improvements that prove broadly useful.
It is not blocked by DRM but different codec.
Even if you have two MP4 files, but if they were encoded differently ffmpeg will still need to do some computation to join them.
Gapless playback with MSE would require identical encoding, which is likely more prevalent in the Apple catalog than the wild west of Youtube. Client-side transcode would require DRM cooperation.
For two video streams with different encodings, swapping between two media players + prefetch can give a close approximation of a continuous video stream.
This is technically possible today but blocked by DRM and closed apps/players. Innovation would be unlocked if 3rd party apps could create custom viewing experiences based on licensed and purchased content files downloaded locally, e.g. in your local Apple media library. The closed apps could then sherlock/upstream UX improvements that prove broadly useful.