I'm working on a game. My game stores audio files as 44.1kHz .ogg files. If my game is the only thing playing audio, then great, the system sound mixer can configure the DAC to work in 44.1kHz mode.
But if other software is trying to play 48kHz sound files at the same time? Either my game has to resample from 44.1kHz to 48kHz before sending it to the system, or the system sound mixer needs to resample it to 48kHz, or the system sound mixer needs to resample the other software from 48kHz to 44.1kHz.
You are right; the system sound mixer should handle all resampling unless you explicitly take exclusive control of the audio device. On Windows at least, this means everything generally gets resampled to 48khz. If you are trying to get the lowest latency possible, this can be an obstacle... on the order of single digit milliseconds.
And actually, why do we have both 48kHz and 44.1kHz anyway? If all "consumer grade high quality audio" was in 44.1kHz (or 48kHz) we probably could've avoided resampling in almost all circumstances other than professional audio contexts (or for already low quality audio like 8kHz files). What benefit do we get out of having both 44.1 and 48 that outweighs all the resampling it causes?
> And actually, why do we have both 48kHz and 44.1kHz anyway?
Those two examples emerged independently, like rail standards or any number of other standards one can cite. That's really just the top of the rabbit-hole, since there are 8-20 "standard" audio sample rates, depending how how you count.
This isn't really a drawback, and it does provide flexibility when making tradeoffs for low bitrates (e.g. 8 kHz narrowband voice is fine for most use cases) and for other authoring/editing vs. distribution choices.
As far as I understood, both rates ultimately come from trying to map to video standards of the time. 44.1 kHz mapped great to reusing analog tape of the time, 48 kHz mapped better to digital clocking and integer multiples of video standards while also having a slightly wider margin on oversampling the high frequency.
44.1 kHz never really went away because CDs continued using it, allowing them to take any existing 44.1 kHz content as well as to fit slightly more audio per disc.
At the end of the day, the resampling between the two doesn't really matter and is more of a minor inconvenience than anything. There are also lots of other sampling rates which were in use for other things too.
Early audio manufacturers (SONY notably) used 48kHz for profession-grade audio equipment, that would be used in studios or TV stations, and degraded 44.1khz audio for consumer devices. Typically you would pay an order of magnitude more for the 48kHz version of the hardware.
48khz is better for creating and mixing audio. You cannot practically mix audio at 44.1khz without doing very slight damage to audible high frequencies. But enough to make a difference. If you were creating for consumer devices, you would mix at 48Khz, and then downsample to 44.1khz during final mastering, since conversion from 48kHz to 44.1kHz can be done theoretically (and practically) perfectly. (Opinions of the OP notwithstanding).
I think it's safe to say that the 44.1kHz sampling rate was maliciously selected specifically because it is just low enough that perfect playback is still possible, but perfect mixing is practically not possible. And obviously maliciously chosen to be a rate with no convenient greatest common denominator with 48Khz, which would have allowed easy and cheap perfect realtime resampling. Had Sony chose 44.0kHz, it would be trivially easy to do sample rate conversion to 48Khz in realtime even with primitive hardware available in the late 1970s. That extra .1kHz is transparently obvious malice and greed in plain sight.
Presumably SONY would sell you the software or hardware to perform perfect non-realtime conversion of audio from 48khz to 44.1khz for a few tens of thousands of dollars. Not remotely subtle how greedy all of this was.
There has been no serious reason to use 44.1kHz instead of 48kHz for about 50 years, at least from a technology point of view. (And no real reason to EVER use 44.1khz instead of 48kHz other than GREED).
The Wikipedia page explains it as coming from PCM adaptors that put digital audio on video tapes. The constraints of recording on videotape led to 44.1kHz being best option. It sounds like there wasn't enough capacity for 48kHz.
What would you consider evidence? Emails between standards committee members agreeing to collude in order to screw pro-audio customers?
The evidence is: why on earth would anyone on a standards committee choose 44.1kHz, instead of 44.0kHz? The answer: 44.1kHz was transparently obviously chosen to make it impossible to perform on-the-fly rate conversions.
The mathematics of polyphase rate converters was perfectly well understood at the time these standards were created.
Someone else wrote that it was chosen to best match PAL and NTSC. IIRC there is also a Technology Connections video about those early PCM adaptor devices that would record to VHS tape.
4800kHz and 44100kHz devices appeared at roughly the same time. Sony's first 44100kHz device was shipped in 1979. Phillips wanted to use 44.0kHz.
If you can do 44.1khz on an NSTC recording device, you can do 44.0khz too. Neither NTSC digital format uses the fully available space in the horizontal blanking intervals on an NTSC VHS device, so using less really isn't a problem.
Why is 44Khz better? There's a very easy way to do excellent sample rate conversions from 44.0Khz to 48Khz, you upsample the audio by 12 (by inserting 11 zeros between each sample), apply a 22Khz low-pass filter, and then decimate by 11 (by keeping only every 11th sample. To go in the other direction, upsample by 11, filter, and decimate by 12. Plausibly implementable on 1979 tech. And trivially implementable on modern tech.
To perform the same conversion from 44.1kHz to 48kHz, you would have to upsample by 160, filter at at a sample rate of 160x44.1kHz, and then decimate by 147. Or upsample by 147, filter, and decimate by 160. Impossible with ancient tech, and challenging even on modern tech. (I would imagine modern solutions would use polyphase filters instead, with tables sizes that would be impractical on 1979 VLSI). Polyphase filter tables for 44.0kHz/48.0kHz conversion are massively smaller too.
As for the prime factors... factors of 7 (twice) of 44100 really aren't useful for anything. More useful would be factors of two (five times), which would in increase the greatest common divisor from 300 to 4,000!
Most stuff on the internet ripped from CD is 44.1. 48 is getting more common. We’re like smack in the middle of the 75 year transition period to 48kHz.
For new projects, I use 48, because my mics are 32bit (float!)/48kHz.
technically we could use 40kHz and just upsample, the extra frequency over 40kHz is basically leeway to make analog part possible/cheap, but it is not technically needed in the signal
the first CD player didn't had compute power to upsample perfectly but modern devices certainly do.
AFAIU, 40kHz exactly wouldn't really work, if your goal is to represent 0Hz-20kHz: in order to avoid aliasing, you need a low pass filter to remove all frequency content above half your sample rate, and no filter is infinitely hard (and you generally want to give the filter a decent range of frequencies to work with). If you want to start your low pass filter at 20kHz, you want it to end (i.e reach practically -∞dB) at a few kHz above 20kHz. If you used a sample rate of exactly 40kHz, you would need your low pass filter to reach -∞dB at 20kHz, meaning it'd have to start somewhere in the audible region.
Though this is just my understanding. Maybe I'm wrong.
Is this not the job of the operating system or its supporting parts, to deal with audio from various sources? It should not be necessary to inspect the state of the OS your game is running on, to know what kind of audio you can playback. In fact, that could even be considered spying on things you shouldn't. Maybe the OS or its sound system does not abstract that from you and I am wrong about the state of OS in reality, but this seems to me like a pretty big oversight, if true. If I extrapolate from your use-case, then that would mean any application performing any playback of sound, needs to inspect whether something else is running on the system. That seems like a pretty big overreach.
As an example, lets say I change frequency in Audacity and press the play button. Does Audacity now go and inspect, whether anything else on my system is making any sound?
It is also the job of the operating system or its supporting parts to allow applications to configure audio devices to specific sample rates if that's what the application needs.
It's fine to just take whatever you get if you are a game app, and either allow the OS to resample, or do the resampling yourself on the fly.
Not so fine if you are authoring audio, where the audio device rate ABSOLUTELY has to match the rate of content that's being created. It is NOT acceptable to have the OS doing resampling when that's the case.
Audacity allows you to force the sample rate of the input and output devices on both Windows and Linux. Much easier on Windows; utterly chaotic and bug-filled and miserable and unpredictable on Linux (although up-to-date versions of Pipewire can almost mostly sometimes do the right thing, usually).
> Is this not the job of the operating system or its supporting parts, to deal with audio from various sources
I think that's the point? In practice the OS (or its supporting parts) resample audio all the time. It's "under the hood" but the only way to actually avoid it would be to limit all audio files and playback systems to a single rate.
I don't understand then, why they need to deal with that when making a game, unless they are not satisfied with the way that the OS resamples under the hood.
You cannot avoid it either way then, I guess. Either you let the system do it for you, or you take matters into your own hands. But why do you feel it necessary to take matters into your own hands? I think that's the actual question that begs answering. Are you unsatisfied with how the system does the resampling? Does it result in a worse quality than your own implementation of resampling? Or is there another reason?
I don't feel it necessary to take matters into my own hands. If you read my original message again:
> Either my game has to resample from 44.1kHz to 48kHz
> before sending it to the system, or the system
> sound mixer needs to resample it to 48kHz, or the
> system sound mixer needs to resample the other software
> from 48kHz to 44.1kHz
I expressed no preference with regard to those 3. I was outlining the theoretically possible options, to illustrate that there is no way to avoid resampling.
I got a different impression, because you also wrote:
> If only it was that simple T_T
Which to me sounded like _for you_ it's not simple because reasons, which led me to believe, that you _do_ want to take it into your own hands, making it not simple, ergo not being able to let the OS do it, for reasons. Now I understand what you mean, thanks!
Getting pristine resampling is insanely expensive and not worth it.
If you have a mixer at 48KHz you'll get minor quantization noise but if it's compressed already it's not going to do any more damage than compression already has.
That's a clear need IMO, but it'd be slightly better if the game could have 48 kHz audo files and downsampled them to 44.1 kHz playback than the other way around (better to downsample than upsample).
44.1kHz sampling is sufficient to perfectly describe all analog waves with no frequency component above 22050Hz, which is substantially above human hearing. You can then upsample this band limited signal (0-22050Hz) to any sampling rate you wish, perfectly, because the 44.1kHz sampling is lossless with respect to the analog waveform. (The 16 bits per sample is not, though for the purposes of human hearing it is sufficient for 99% of use cases.)
22050 Hz is an ideal unreachable limit, like the speed of light for velocities.
You cannot make filters that would stop everything above 22050 Hz and pass everything below. You can barely make very expensive analog filters that pass everything below 20 kHz while stopping everything above 22 kHz.
Many early CD recordings used cheaper filters with a pass-band smaller than 20 kHz.
For 48 kHz it is much easier to make filters that pass 20 kHz and whose output falls gradually until 24 kHz, but it is still not easy.
Modern audio equipment circumvents this problem by sampling at much higher frequencies, e.g. at least 96 kHz or 192 kHz, which allows much cheaper analog filters that pass 20 kHz but which do not attenuate well enough the higher frequencies, then using digital filters to remove everything above 20 kHz that has passed through the analog filters, and then downsampling to 48 kHz.
The original CD sampling frequency of 44.1 kHz was very tight, despite the high cost of the required filters, because at that time, making 16-bit ADCs and DACs for a higher sampling frequency was even more difficult and expensive. Today, making a 24-bit ADC sampling at 192 kHz is much simpler and cheaper than making an audio anti-aliasing filter for 44.1 kHz.
The analog source is never perfectly limited to 20 kHz because very steep filters are expensive and they may also degrade the signal in other ways, because their transient response is not completely constrained by their amplitude-frequency characteristic.
This is especially true for older recordings, because for most newer recordings the analog filters are much less steep, but this is compensated by using a much higher sampling frequency than needed for the audio bandwidth, followed by digital filters, where it is much easier to obtain a steep characteristic without distorting the signal.
Therefore, normally it is much safer to upsample a 44.1 kHz signal to 48 kHz, than to downsample 48 kHz to 44.1 kHz, because in the latter case the source signal may have components above 22 kHz that have not been filtered enough before sampling (because the higher sampling frequency had allowed the use of cheaper filters) and which will become aliased to audible frequencies after downsampling.
Fortunately, you almost always want to upsample 44.1 kHz to 48 kHz, not the reverse, and this should always be safe, even when you do not know how the original analog signal had been processed.
yeah but you can record it in 96kHz, then resample it perfectly to 44.1 (hell, even just 40) in digital domain, then resample it back to 48kHz before sending to DAC
If you have such a source sampled at a frequency high enough above the audio range, then through a combination of digital filtering and resampling you can obtain pretty much any desired output sampling frequency.
the point is that when down sampling from 48 to 44.1 you can for "free" do the filtering since the down sampling is being done digitally with an fft anyway
I suppose the option you're missing is you could try to get pristine captures of your samples at every possible sample rate you need / want to support on the host system.
you're not missing something. You can re-sample them safely as stated by the author. They simply state you should check the re-sampler as:
> Although this conversion can be done in such a way as to produce no audible errors, it's hard to be sure it actually is.
That is, you should verify the re-sampler you are using or implement yourself in order to be sure it is done correctly, and that with todays hardware it is easily possible.
I'm working on a game. My game stores audio files as 44.1kHz .ogg files. If my game is the only thing playing audio, then great, the system sound mixer can configure the DAC to work in 44.1kHz mode.
But if other software is trying to play 48kHz sound files at the same time? Either my game has to resample from 44.1kHz to 48kHz before sending it to the system, or the system sound mixer needs to resample it to 48kHz, or the system sound mixer needs to resample the other software from 48kHz to 44.1kHz.
Unless I'm missing something?