A PC based home studio (part IV)

Part IV - Digital audio recording vs. analog audio recording - some background information

Sometimes I will refer to previous articles since I don't want to repeat myself too much.

All of you are familiar with digital storage of audio information - you have a CD player at least. The CD player has to deal with the easy part of digital audio - converting the digital information (which is basically a series of numbers) back into analogue electrical signals which can then be amplified and sent to the speakers. From the engineering point of view, both processes (encoding an analogue signal to a digital stream of numbers and decoding that information back to an analogue signal) are equally important. But the user has to consider several things when recording analogue signals to digital media, while pressing play on the CD player is as simple as it gets - the engineer who designed the output section of the CD player has done all the work for you.

In this article I will give some basic information about how audio is stored digitally and then I'll tell you some basic but important things to consider when recording to a digital system. This article is a bit more theoretically than the last. But I always had the wish to understand the things I'm working with. If you want only the practical bits, scroll down to Analogue recording.

Some basics and technicalities on digital audio

So how do you store an analogue signal in a digital way anyway?

The name "analogue" comes from the thought that the electrical signal in analogue systems is an analogy to the sound it represents. Sound is a periodic change of air pressure, and an analogue electrical signal is supposed to change in the same way the air pressure changes.

Digital means basically that everything is stored as numbers. Text, images, sounds, etc. And these numbers deep down in the machine are treated as binaries. In the binary system, there are only two numbers, or better, states, and those are one and zero or "on" and "off", electrically speaking. All this is like one more level of abstraction. Or like the translation into another language. While you can simply amplify analogue signals and drive an actuator (like a loudspeaker for example) with a directly amplified version of the analogue signal, this isn't possible with digital signals. You need a "translator" inbetween, a device that translates the numbers of the digital world into an analogue signal which can then be amplified. These devices are called Digital-Analog-Converters, abbr. "DAC".
If you want to store analogue signals as digital information, you need another translator that does the process the other way round - this device reads an analogue signal and translates it into numbers that can then be processed by the computer. Those things are called Analog-Digital-Converters, abbr. "ADC".

If you ever wondered, your graphics card in the pc has a DAC also. Your CRT (= cathode ray tube) monitor has an analogue input - in fact it receives five analogue signals - one for red, one for blue and one for green, as well as one for the horizontal synchronisation and one for vertical synchronisation. Your graphics card has the job of translating the matrix of pixels that is the computer's screen representation into the right analogue signals so that the CRT monitor can display it.

Now comes some really basic info about binary numbers. If you are familiar with those things, skip this paragraph and read on at Audio level scales: what is a dB? or Analog-Digital conversion of audio signals

An introduction to binary numbers

Now, if you are not familiar with the terminology, I'll give you two definitions first, because I'll use this words often in the text and don't want to explain them all over again.
In the computer world, there are two types of numbers: integer numbers and floating point numbers. These types can have several subclasses, which depends on the machine being used, the programming language etc.

An integer is a whole number, like 1, 2, 3, 4... Some integer classes include negative numbers. A floating point number (or "float" or "real") is the computer's representation of a real number. Since no computer can calculate with infinitesimal precision, these are floating point numbers with a fixed resolution. Examples: 0.999, -12765.9795, 0.768^45.

Both kinds of numbers are stored as a succession of bits, but for the different types there are different interpretations of this succession.

One digit of a binary number is called a bit. A bit can have two different states: one/zero or high/low or on/off...

A number that contains eight bits is called one byte. I don't know if this is a wickedly misspelled version of "bite", but back on the old C64, a 4-bit number was called a "nibble" :-)
Now a number can be put together from different numbers of bits. How many bits a number has determines how big this number can be if it's an integer. I will not treat floating point numbers at this point. An eight-bit number for example can be max. = 255. Why is that so?


Table 1. - Eight bit numbers

8 bit binary    "normal" (decimal) number
00000000     =      0 - smallest number possible with 8 bits
00000001     =      1 
00000010     =      2
00000011     =      3
00000100     =      4
      etc.
11111111     =    255 - greatest integer possible with 8 bits

Have a look at table 1 to see how one counts in the binary system. For eight bits there are 256 possible combination to arrange the 1s and 0s. This is the case because for each bit you have to multiply with 2 (the number of possible states for each bit). This is 2*2*2*2*2*2*2*2 = 2^8 = 256. Notice that now we can only display positive integer numbers. If we want to display positive and negative numbers, we have to sacrifice one bit to represent the sign of the number. The remaining seven bits give max. 128 combinations, but we lose nothing, because we have now 128 positive and 128 negative combinations, still 256 in total. Now, if we increase the number of bits per number, we also increase the biggest number we can represent. 16 bits = 2^16 = 65536 combinations, the biggest number is 65535. For 24 bit numbers: 2^24 = 16777216 = 16.7 millions. Notice that the 24 bit number needs three times as much memory and therefore storage space than the 8bit number. You can continue this up to as many bits as you want.

Audio level scales: what is a dB?

A dB (decibel) describes a change of a signal level. Saying "6dB" doesn't say anything about the actual signal level itself, for that you have to specify a reference point. It's like the height of a mountain. You have to say "It's 2300 m above sea level", giving the sea level as a reference point, or else the value would be pretty useless. But you can say "It's 500 m higher than the other mountain over there" without having to specify a reference point. In analogue audio, there are two common dB scales with different reference points.
There's dBV and dBu levels. Both are relative to a certain voltage that equals a signal of 0 dB. 0 dBV = 1.0 volts and 0 dBu = 0.775 volts. So if you say, "-8 dBV", this means an actual, defined voltage level.

Now all dB scales are logarithmic scales. This means that an increase of 6dB (this is relative, still remember?) means that the signal has the double level than before. Increasing it by another 6 dB means that you have to double the level again. Let's say you have a signal of 6 dBV = 2*1.0V = 2 V. Now if you increase the signal level another 6dB, you get 12 dBV = 2* 2V = 4*1.0V = 4 V. Or 18 dBV = 2*4V = 8V. If you are subtracting dBs, -6dB means you have to halve the signal level.

In digital audio world, the reference point for the signal level is not a voltage. It's the biggest number that can be represented by your digital audio data. If you have a 16 bit system (like a CD),

then 0dB = 1111 1111 1111 1111 (maximum binary number possible with 16 bit)

This point is usually called 0 dBFS (0 dB full scale), and it's the maximum level that's digitally possible (this holds only true for integer format digital audio storage, but this is the most common format, and also the way the actual DAC and ADC converters work). All other levels in digital audio are lower and are represented by negative dBs.
How many volts equal 0dBFS, that means how big is x in the equation 0dBFS = x dBu, is up to the converters. It makes no difference for the way digital audio is stored and processed in the digital domain.

Analog-Digital conversion of audio signals

An ADC does the following: it measures the voltage at its input, compares it to a given reference voltage and then calculates a number that represents the measured voltage relative to the reference voltage. This number can be 8bits long, 12bits (pretty uncommon for audio nowadays, but there were 12 bit samplers...), 16bits, 20bits or 24bits. This number is called a "sample". It is an integer. The number of bits that this number is made of is sometimes called the "word length" or more often the "bit rate". The higher the word length of the samples, the better is the resolution of the ADC. The analogue voltage is represented "finer" in the digital domain. It is like with bitmap files. The higher the resolution, the finer are the displayed details. Now, increasing the wordlength is like increasing the resolution of a bitmap only in one direction, lets say the y-axis. You get a finer vertical resolution, but the horizontal details are still blurred.

The ADC does this sampling not continuously, but in fixed intervals. How many samples it puts out per second is determined by the "sampling rate". This is for CDs 44100Hz, that means the ADC puts out 44100 samples per second. Other common sampling rates are 11kHz, 22kHz (both obsolete except for really crappy sounding multimedia apps where the amount of data is critical), 32kHz (in some dats, but seldom nowadays), 48 kHz (standard for DAT), 88.2 kHz and 96 kHz (the rate for the proposed DVD audio).

If you increase the sample rate, you'll increase the resolution of an ADC. The bitmap analogy still holds, you have to think of the other direction of the bitmap, the x-axis, as the timeline.

Like with bitmaps, increasing the resolution of the ADC produces higher data rates and needs more memory, too. All this resolution stuff is true for the digital-analog-conversion, too.

Now, there is a theorem called "Nyquist Theorem", which comes from the numeric mathematics. This theorem says that if you sample a signal with a fixed frequency, then the maximum frequency that you can resolve/record with this method is 1/2 of the sampling rate. This is called the Nyquist frequency. That means that with a sampling rate of 44.1 kHz you can record a maximum analogue frequency of 22.05 kHz. I won't go further into this, if you want to learn more about this, get a good textbook about numeric mathematics. The only thing we need to know is that this is the way it is.

Because of the integer numbers that are produced by the ADC, a louder signal is much better converted that a quiet signal. This is because the relative difference between two numbers is much greater for small integers than for big integers. The relative difference between 10 and 11 is 10% of 10, but the relative distance between 10000 and 10001 is only 0.01% of 10000. The conclusion is that one tries to provide the ADC with a signal level that is as close to 0 dBFS as possible.

Digital-analog conversion of audio signals

Now a DAC does the opposite. It is fed a stream of numbers, the samples, and it translates these numbers into voltages. It compares the number with a reference point which is equivalent to a reference voltage and puts then out a voltage proportional to the number. It does all this with the same frequency that the ADC sampled the audio signal. If it would work with a different sampling rate, the audio signal would have a different pitch, because the audio is played back "faster" or "slower". Now, the DAC produces a steppy signal. If for example the numbers it got would be alternating 0 and the maximum value, this would be put out as a square wave with 1/2 of the sampling frequency (22.05 kHz for 44.1 kHz sampling rate).

A square wave contains much more frequencies than the fundamental frequency. This is explained by the Fourier analysis and Fourier series theory. This theory says basically that you can produce every waveform by a sum of sine and cosine waves of different frequencies, starting with the fundamental frequency of the waveform you want to produce, and then adding components with a multiple of that frequency. The "edgier" the waveform is, the more higher frequency sine and cosine waves are needed to represent it. These high frequency parts are the "harmonics" or "overtones". So a square wave has lots of harmonics in it. Now, these high frequency components weren't present in the original analog signal that was sampled by the ADC. They are added by the DAC. This process is called aliasing. To filter these unwanted frequency components out of the output signal again, you need a lowpass filter. Such a lowpass "anti-aliasing" filter is part of every DAC. It is actually part of every ADC, too, because if you don't filter out frequencies higher than the Nyquist frequency, you get aliasing artifacts during the AD-conversion, too.

Some words on dynamics

Since a continuous analogue signal is represented by numbers which have finite distances between them, the A/D conversion produces an inevitable error. Each possible digital sample value is equivalent to a certain, fixed, input voltage of the ADC. Now if the input voltage has a value between two number values, the ADC cannot represent this voltage exactly. The result is a noise in the digital signal, which is called quantization noise. The dynamic range of a digital system is defined by the ratio between the maximum level possible (0 dBFS) and the level of the quantization noise. The level of the quantization noise is proportional to the bit rate of the ADC, because if you have less bits per sample, the distance between voltages that are represented is greater, and so is the inherent AD conversion error. Remember, this is the theoretical dynamic range which is defined only by the bitrate of the digital system. If the whole system can reach this dynamic range depends also on the quality of the analog circuits used in front of the ADC and after the DAC.

The theoretical dynamic range of a 16 bit system is 96 dB. That means, the quantization noise has a level of -96 dB. To reach this is possible with analog circuitry.
The theoretical dynamic range of a 24 bit system is 144 dB. This impressive value is quite impossible to reach with analog circuits. That is because analog electrical components like resistors and capacitors introduce a certain minimum noise to the signal, simply due to electrons that are moving because of the temperature of the circuit. One can't get rid of that. So you won't find a 24 bit soundcard that has real 144dB of dynamics. 110 to 116 dB are great to excellent values for digital systems.

In spite of that, the dynamics of the 24 bit system is not wasted. Every processing of digital audio involves some calculations. Be it a simple level change, or an equalization or whatever. If the result of a calculation is not an integer number, it has to be rounded. This rounding process introduces, guess what, more quantization noise. The higher the bitrate of the digital material, the smaller the noise due to digital processing. Since this noise adds during several calculation/processing steps, one wants to keep the bitrate as high as possible to avoid this noise becoming audible. This is the reason why a lot of digital systems (be it hardware digital mixers or FX units or be it software systems) work with an even higher bitrate internally and only round the result once directly before the outputs. Working with floating point numbers brings great benefits here, so a lot of systems work internally with 32bit floating point numbers. Since the PC's floating point unit has an internal floating point format with a bitrate of 80 bits, many software packages use this format, which yields even greater accuracy *internally*. Still, the result of all this calculations has to be converted into an integer to be sent to the DAC.

Analogue recording

Let's for a few lines leave the digital domain and talk about analogue recording. Most (or all) of you have recorded to an analogue tape machine before. Be it a simple cassette deck or a 2" recording machine. The process of recording includes the setting of the recording levels. Probably you have learned to set the levels that way that the red LED that indicates clipping or saturation shortly lights up on peaks. To use the famous tape compression effects of analog tape you have certainly "hit the tape hard" before so that all of the red lights light up. This is great with analog tape because the saturation/compression effects and the introduced distortion is pleasant to the ear and in most cases even wanted if you "hit the tape hard". It is for example great for a snare drum and a bass drum to achieve a really good "kick".

This is all nice and fine for analogue equipment, but for digital recording you have to think different.

Digital recording: record levels

In digital land you have to set the record levels so that they do not clip. The input level of the ADC must not exceed 0dBFS. Or else you will get digital clipping which sounds extremely nasty. Why is this so?
Remember in the paragraph above when I talked about the maximum possible levels of a digital system? Let's assume we have a 16 bit system, then 0dB = 1111 1111 1111 1111 (maximum binary number possible with 16 bit). Now what happens if the input level exceeds that number? The next binary number after the above is 1 0000 0000 0000 0000. You see what happens: our system can only recognize 16 bits, so the 17th bit, the leading 1, is simply forgotten and the ADC puts out 0000 0000 0000 0000. This means the waveform is clipped back to the bottom of the scale, which means that there are abrupt edges in the waveform which introduce a nasty form of distortion. Imagine a sine wave where the peak of the wave is horizontally cut and pasted again at the bottom of the waveform. This is what this form of digital clipping does, and you can easily imagine how bad the result will sound. If the ADC behaves not that bad, it will produce at least ultra hard brick wall limiting which sounds nasty, too.

So the first, most important rule of digital recording is:

Don't ever clip the inputs. Set the record levels so that the peak level is lower or equal to 0 dBFS.

In this case, too, 24bit converters are our friends. They have a so big dynamic range that it doesn't matter if we don't push the record levels really close to 0 dBFS. We can also record without limiters and just leave the signal a big enough headroom.

Simply remember: you can not "hit the tape hard" with digital hardware. It simply is not possible. You can simulate tape compression in the software *after* the signal has been digitized, so not all is lost :-) (provided you have a software that can do this).

Playback of digital audio

The same applies to the playback of digital audio. A DAC cannot put out more level than what is equivalent to 0dBFS. So you cannot clip it directly. But you can clip the signal internally during processing stages that increase the level of a signal (equalising for example). You have to make sure that the system does not clip during processing. This is especially important if you work with a system that uses integer numbers throughout the processing stages. If you work with a system that uses floating point numbers internally, you don't have to worry about internal clipping, since floating point numbers allow a for practical purposes nearly unlimited heardroom. But you have to scale the signal down (or up) to 0dBFS maximum before you send it to your soundcard. Ask the manufacturer of your software or hardware device if it works with floating point numbers, if you can't find out otherwise.

Digital recording: other considerations

DACs will also not compress your input signal at all, like analogue tape always does to a certain extent.
Another thing to remember is that input meters on digital hardware are very fast, because they have to be peak meters. They will read substantially different than, say, VU-meters on a tape recorder, especially those old analogue meters. Those will show more the mean level of your recording, since they're too slow to get all the fast peaks. Because of that they would be useless to monitor a DAC's input signal.

Of course, if you happen to like the sound, you can overdrive an ADC. It won't harm the thing (except if you feed it the sine wave from your wall plug, hehe). Music is an art, and in art there's nothing forbidden, and there's no absolute wrong or right. What I gave you in this article are nothing but technical considerations. I told you how to satisfy the machine, so to say. I tried to explain to you what is happening if you press record on a digital system. So all this may be "right" from an engineering point of view. The musician's point of view should be different. I'd like to quote Jezar from http://www.dreampoint.co.uk here:

"Why do we record music? So that we can hear it."

NOT for the sake of the equipment. Or some recording rules. So experiment, be creative!

PS. I tell you more about digital processing and the possibilities of digital recording, editing and mixing in later installments. This one's long enough already.

The next article will be about MIDI. If you have worked with a digital multiFX, you'll probably have used MIDI already to switch between patches. If you play a synthesizer or have a sampler, you'll know a lot about MIDI, but since I don't assume that all of you do, I'll explain a lot of basics the next time.

The next part will be:

Back to Part III

Part V - An introduction to MIDI

Back to the index page

All this stuff is (c) 2000-2002 Tammo Trueper.

The trademarks I have mentioned all belong to their respective owners.

If you want to drop me a note, or ask a question, here's my eMail address:

boogie@bigfoot.de