Real time is one of those phrases that bounces around without it being clear when it applies and what it means.
Once upon a time there was analog audio, whereby an electrical waveform was an analog of sound pressure or velocity at a microphone and could produce sound again at a loudspeaker. There was negligible delay in the electrical signal,
so what came out of the speaker was to all intents coincident with what went in the microphone, to the great joy of those performing sound reinforcement. Even when the sound modulated a radio signal, the delay was pretty small, so there was no need to discuss it or even have a term for it. Today we would say such systems were synchronous, from the Greek for something or other.
Many years ago computing had nothing to do with audio. Early computers took their input from paper tape or punched cards that had to be prepared in advance. When the computer processed the input, it worked in its own time and took as long as it took, and when it was done, something came out. The egg timer was in there, but had yet to become visible. When computers were incredibly expensive each one had many users and each user would get a chunk of processing time and then have to wait until everyone else had had their share, hence the term time-sharing.
As computers became faster, they became attractive as a means for process control, whereby they could use as input a number of conditions in some plant and perform calculations about how to adjust valves or change temperatures to keep the system working properly. Clearly there was a limit on how long the computer could take to do its calculations, or its decisions would be too late to be useful. That is where the term real time came from. It was a marketing term to distinguish process control computers from the earlier types that did time sharing. To be pedantic, the computer hardware could be common. It was the operating system that determined whether the machine would time share or run in real time.
Like most marketing terms, it wasn’t quite accurate. Every new set of input conditions would need to be processed, and each process would require the execution of a set of instructions before the outputs would be available. The execution of the instructions takes a finite time, so to describe what was going on as real time was incorrect. It was just closer to real time than what went before.
In those days we had audio tape recorders used to cut vinyl discs, and a few far-sighted people were looking into converting audio into the digital domain. The best playing time on a vinyl disc was obtained by altering the groove pitch as the function of the sound level. The pitch could be left quite fine provided the cutter sled was speeded up one revolution before any fortissimi. The audio needed to be provided to the sled control about two seconds before it went to the cutter. This could be done with a special tape deck having an advanced playback head as well as the regular one, but someone had the bright idea of using a normal tape deck and an audio delay which consisted of an ADC, some memory and a DAC. So was it the sled control that was working in real time or was it the cutter?
Well, with a bit of juggling of the definition, both could be working in real time if we say that real time means manipulated at the same rate as the original. That definition works for the replay and processing of sound recordings, but not for sound reinforcement, where we need a new definition such as live.
Many aspects of analog audio work quite well, especially microphones, mixers and FM radio, but where analog falls down is in recording and cable transmission, so not surprisingly that is where solutions first appeared. The use of digital audio to get high quality from the studio to FM radio transmitters was an early success. The AES/EBU digital audio interface has been enormously popular. Both of these systems are synchronous and so could be said to work in real time, with a negligible delay in AES/EBU to allow for multiplexing and demultiplexing and a short delay in NICAM to allow for the use of small companding blocks.
Digital audio recorders did not reduce the wow and flutter of analog recorders, but eliminated it completely. This is not because there was anything magical about their media or transports, but because discrete samples could be time base corrected with arbitrary accuracy using buffer memory.
In the early eighties, my then boss, the late Leonid Strashun, had written a notorious article for a broadcast magazine in which he likened a digital video time base corrector to a bus station, where buses always left on time even if their arrival was irregular due to traffic. I also remember the route from our favourite pub in Basingstoke back to the now demolished City Wall house, home of Sony Broadcast, took us through the bus station. My Japanese colleague, Ajimine-san, looked at the bus station and declared that it was faulty and I naturally asked why.
“John-san”, he said in fluent Jinglish, “I see onry red buses, no green or brue buses.”
On occasions like that, it’s good to be alive.
Today, when even nostalgia isn’t what it used to be, it hardly seems worth using the term digital audio. Practically all audio is digital. Not only that but practically all audio today is in the form of data that are stored on some generic data storage medium rather than on a dedicated audio medium. The distinction between audio and computing has largely vanished.
Whilst an audio file is just a bunch of binary numbers, like your bank statement or the file that stores this text, audio data do differ from generic data in that they are supposed to be reproduced at a certain rate. In an analog device, the speed is built into the hardware: the turntable or the capstan revolves at the correct angular velocity. With data, the time base has deliberately to be organised.
One way of achieving that is to feed the DAC at the destination with samples at the correct rate from a time base corrector whose memory state is constantly monitored. When the memory is near full, there is no need to fetch another block of data from the medium or the network. When the memory is near empty, it is urgent to obtain another block of data. In such a system, the output sampling rate is the same as the original, but there must be a delay due to the time base corrector and any network. Such a system could be said to work in real time, but it is more precise to say it is isochronous, from the Greek iso, meaning “the same” and Kronos, an island populated entirely by horologists.
In a standardised device such as a CD player, there is only one sampling rate to worry about; the medium can be scanned at whatever rate keeps the time base corrector happy. But what about systems in which the source is remote and sends data to many destinations? In that case it is the destination that should lock to the source. I say “should” because it isn’t always done. A source and a destination that are nominally running at the same clock speed but which are unlocked will suffer the occasional sync. slippage and if this can be plastered over at the receiver, the system can be acceptable. Such systems are called plesiochronous, from the Greek plesion, meaning no banana.
If it is important that source and destination are synchronous, we need to reconstruct the source clock at the destination. This is done by sending samples of the state of a counter driven by the source clock to the destination. Comparison of the samples with the state of a counter at the destination allows the destination clock to be locked. This is the program clock reference system of the MPEG transport stream. Once synchronous counters exist at both ends of the system, the source counter can be sampled to produce a time stamp that can accompany a data block so the destination knows where on the time axis that block should be reproduced.
In theory, with a common program clock reference, correct time stamps attached to sound blocks and video frames means that there should be precise control of lip sync. in digital television broadcasting. In practice a lot of hardware doesn’t meet the MPEG spec. and simply klunks the audio around until the buffer management stops complaining. The lip sync will be incorrect and will be different if the equipment is turned off and on again. Such systems are called apathochronous, from the Greek island of Apathos where nobody gives a monkey’s.