The Mathematical Genius of Auto-Tune

jeremysalwen · on Oct 16, 2017

>The equations that do autocorrelation are computationally exhaustive: for every one point of autocorrelation (each line on the chart above, right), it might’ve been necessary for Hildebrand to do something like 500 summations of multiply-adds... >Hildebrand realized he was limited by the technology, and instead of giving up, he found a way to work within it using math. “I realized that most of the arithmetic was redundant, and could be simplified,” he says. “My simplification changed a million multiply adds into just four. It was a trick — a mathematical trick.”

I think this is the article's way of dramatizing the standard way of calculating the autocorrelation using the convolution theorem: https://en.wikipedia.org/wiki/Autocorrelation#Efficient_comp...

conistonwater · on Oct 16, 2017

I would like to see them dramatize Dijkstra's algorithm in the same style. Instead of looking at all possible paths of arbitrary length within the graph, of which there are infinitely many, ...

thearn4 · on Oct 16, 2017

FFT is a pretty cool thing. But yeah, this "trick" has been used for correlation-based computations for quite awhile.

Assuming that it's the trick that the author was referring to, the article is actually a bit sparse on details.

soundwave106 · on Oct 16, 2017

The patent for Autotune can be found here: https://www.google.com/patents/US5973252?dq=5,973,252

My (very non-expert) reading of the patent is that it uses sample reduction and some specific features of the periodic form of the human voice to simplify the math involved in the auto-correlation routine. Although this routine does seem to be unique as far as I know, I wouldn't be surprised if it shares similarities to other techniques to reduce correlation complexity.

hamiltonkibbe · on Oct 16, 2017

Polyphase filters maybe? It's an approach used primarily in resampling, and its an easy way to skip a LOT of Multiply-by-zero-and-Accumulate-ing

nejenendn · on Oct 17, 2017

It decoded to a phase vocoder on the nearest semitones.

dspig · on Oct 17, 2017

read the patent - all time domain, no phase vocoder involved.

amelius · on Oct 16, 2017

> “I realized that most of the arithmetic was redundant, and could be simplified,” he says. “My simplification changed a million multiply adds into just four.”

Doesn't sound like FFT.

make3 · on Oct 16, 2017

at the era of deep learning, no one is impressed by their 500 summations of multiply adds, or 1 million multiply adds. A geforce 1080ti does 11.3 TFlops, with a T, that's 11 300 000 000 000 floating point operations per second..

topranks · on Oct 17, 2017

You should still be impressed with the efficiency.

Just cos you've got enough grunt to do it the dumb way doesn't mean the smarts that produced this are somehow unimpressive.

jdietrich · on Oct 16, 2017

Audio DSP has become much more sophisticated now that we've got cycles to burn. RX and Melodyne are practically voodoo.

https://www.youtube.com/watch?v=OVD7unKgkYo

https://www.youtube.com/watch?v=9FScFKuXXM0

TheRealPomax · on Oct 17, 2017

I have no idea how RX6 does some of the stuff I make it do - all I know is that it does it better than I could have asked for.

smelterdemon · on Oct 16, 2017

In 1996 your processor might be running at 150MHz. Using a standard 44.1kHz sampling rate, that doesn't give you very many cycles to get each sample processed to do this in real time.

hamiltonkibbe · on Oct 16, 2017

That's fine, but being able to run multiple audio tracks worth of VST plugins in real time on a modest laptop is very likely more valuable to their target market

hamiltonkibbe · on Oct 16, 2017

You'd be surprised how quickly those 500 multiply-accumulates per autocorrelation point add up... a 128-tap filter * 500MAC/point * 2 channels(stereo) * 192ksps is already at 24+billion FLOPS for a single track...

p0nce · on Oct 18, 2017

CPU usage is even more of a problem that several years ago. Some very well known _compressors_ takes a whole 10% of CPU here. 10 of them and you're not realtime, probably earlier than that.

pocketsquare2 · on Oct 16, 2017

I'm impressed by a great product. What are you doing with all those FLOPs?

KKKKkkkk1 · on Oct 17, 2017

Put your shmancy geforce on an insect-sized mobile robot, and see how far that gets you.

kingkawn · on Oct 16, 2017

I’m still impressed.

renaudg · on Oct 17, 2017

It's worth mentioning that even if Auto-Tune has pioneered the field and is now almost a mainstream househould name (much like Photoshop), nowadays all the cool kids are on Melodyne, which is frankly black magic : https://www.youtube.com/watch?v=9FScFKuXXM0

renaudg · on Oct 17, 2017

Another crazy application of Melodyne : turning minor key "sad" songs into "happy" major key ones

http://theweek.com/articles/467109/sad-songs-made-happy-amaz...

YouKnowBetter · on Oct 17, 2017

Thank you for that! As someone completly blank about what is happening in this industry, I was impressed and learned a lot.

jcims · on Oct 17, 2017

Should check out some of their newer videos, they are still at it.

svantana · on Oct 17, 2017

Oh dear, this guy has no limit to his self-aggrandizement, and the interviewer surely does very little to fact-check his outrageous statements. The facts are more like: auto-tune is the combination of pitch detection and pitch shifting, two problems that were extensively researched already. Even the details, like using autocorrelation via FFT, was standard in the field at the time. This type of pitch correction had already been done in academia, but passed off as a curiosity. The truth is, the guy was at the right place at the right time, nothing more than that. Computers were just becoming fast and cheap enough, and plugin formats making this type of product practical were just being deployed.

Nition · on Oct 17, 2017

Detecting pitch may be easy enough, but a lot of programs still seem to have trouble adjusting pitch transparently without weird artifacts (without simply changing the speed). That's something Auto-Tune does seem to manage better than most.

svantana · on Oct 17, 2017

Well there's a huge difference between monophonic and polyphonic pitch shifting. Monophonic pitch shifting (especially in the sub-semitone range that Auto-Tune does) has been done with high quality since the 1970's.

bllguo · on Oct 16, 2017

I thought from the title that it goes into the mathematics. Unfortunately all the article says w.r.t math is a hand-wavy explanation of autocorrelation. But it was an interesting story about the life of Auto-Tune's creator.

biggieshellz · on Oct 16, 2017

See https://www.google.com/patents/US5973252 -- it looks like the trick is somehow simplifying the autocorrelation operation by fitting a quadratic curve.

jsjohnst · on Oct 16, 2017

Yeah, the “mathematical genius” is describing the author of Auto-Tune, not Auto-Tune itself. Through me off a bit too.

pizza · on Oct 16, 2017

I've worked with Hildebrand myself - he's one smart cookie, for sure.

edit/tidbit: he told me that the interview process for hiring new developers goes like this: all the questions are straight out of K&R's C Programming Language, you have to get all of them correct, and so far only one programmer has :P

jsjohnst · on Oct 16, 2017

I have an eidetic memory and have read that book multiple times, so there’s a fair chance I’d pass. That said, if I was given that interview as you described, I’d get up and walk out. Interviews like that prove absolutely nothing useful to evaluating a candidate and thus are a waste of time.

the_d00d · on Oct 17, 2017

Why read it multiple times? Just close your eyes

mikestew · on Oct 17, 2017

Because the eidetic memory only lasts for a few minutes. The commentor might have meant to use the phrase “photographic memory”, but that has thus far been shown to be myth. But, really, the post was about “I’m smart, and I’d just march right out of that interview”, ignoring that it was just a cute anecdote, so I guess it’s irrelevant. Nice catch, though. :-)

jsjohnst · on Oct 17, 2017

Mostly just a cute anecdote is fair, but wasn’t what I was replying to one too? ;)

Seriously though, I’m tired of folks doing stupid counterproductive interviews and then parading them as a good thing.

jsjohnst · on Oct 17, 2017

The other reply was right, because it’s not permanent. I remember a lot just reading, but for full eidetic it lasts a very short time.

hyperbovine · on Oct 16, 2017

So he's hired one programmer.

narvind · on Oct 17, 2017

0

He's obviously referring to himself.

pizza · on Oct 17, 2017

no, he's hired one (other than himself), and she had been working there for ten years by the time we talked

PhasmaFelis · on Oct 17, 2017

So how long has been trying to hire another one? After 10 years I'd be getting desperate.

DonHopkins · on Oct 16, 2017

>"Seismic data processing involves the manipulation of acoustic data in relation to a linear time varying, unknown system (the Earth model) for the purpose of determining and clarifying the influences involved to enhance geologic interpretation. Coincident (similar) technologies include correlation (statics determination), linear predictive coding (deconvolution), synthesis (forward modeling), formant analysis (spectral enhancement), and processing integrity to minimize artifacts. All of these technologies are shared amongst music and geophysical applications."

That is dramatic hand-waving, but it does help convey to non-programmers how dramatic an improvement the right algorithm can implement.

It's true what the article says that pitch tracking was considered a difficult "holy grail" in 1995. I attended a talk about pitch tracking at Interval in 1996 by somebody whose name I don't remember any more. The speaker made the point that pitch is a perceptual -- not a mathematical -- concept, and it was hard but possible to do it in real time on a typical PC at the time (i.e. 90 MHZ IBM ThinkPad 760C). But he had done it, and everyone seemed impressed by his demo! ;)

The article mentioned formant analysis. Maybe that's related to cepstral analysis, which is another way of tracking the pitch and formants of voice, and has its own cool nomenclature of secret code words. "Cepstral liftering" is basically two FFT's followed by an inverse FFT.

If you take the complex FFT of a voice signal, the formants show up as two or three large "hills" in the spectrum, but the pitch manifests as higher frequency repeating furrows in the hills. (Which you want to filter out so you can analyze just the formants, or synthesize a different pitch into them (i.e. auto-tune), so you need to know the frequency of the change in the spectrum magnitude over time: another FFT!)

http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10_files/...

So you take a second FFT of the log of the first FFT, to get a "reverse spectrum" or "cepstrum" in the "quefrency" domain. The fundamental pitch shows up as one big spike in the cepstrum (with smaller spikes for its harmonics). Just "lifter" out the pitch spikes, then perform an inverse FFT to get back to the smooth low frequency formant hills with the high frequency pitch furrows removed.

I'm sure there's a lot of "special sauce" in getting the math tweaked and tuned right so it actually sounds good and runs fast.

https://stackoverflow.com/questions/4583950/cepstral-analysi...

The patent seems to be about autocorrelation, which is something different than cepstral analysis, but maybe they could be used together to get even better results.

I don't know but would love to learn what the trade-offs and limitations the two techniques have.

https://en.wikipedia.org/wiki/Cepstrum

>The name "cepstrum" was derived by reversing the first four letters of "spectrum". Operations on cepstra are labelled quefrency analysis (aka quefrency alanysis), liftering, or cepstral analysis.

https://surveillance7.sciencesconf.org/conference/surveillan...

"The original application was to the detection of echoes in seismic signals, where it was shown to be greatly superior to the autocorrelation function, because it was insensitive to the colour of the signal."

MichailP · on Oct 16, 2017

Are there some examples of Auto-Tune used with more subtle settings? Famous Cher song used it at setting 0, robotic sounding one, but there are 10 more to go, and they are increasingly more subtle.

Also related to this, at least I think, is the issue of turning recording into midi or sheet music. Now that would be a killer app... There is some good software out there, such as Melodyne, but it requires a lot of manual work and tweaking.

some-guy · on Oct 16, 2017

It's been years since I've done music recording as a hobby so someone correct me if I'm wrong -- you can be guaranteed that almost all popular music is pitch corrected in some way. There's Autotune and then there are other ways of pitch correction as well -- Melodyne allows you to manually adjust pitch at a microscopic level. I was able to use Melodyne to perform manual pitch correction that sounded far more "natural" and non-exact than auto-tune.

Of course a lot of folks cannot tell the difference, but to me it's night and day. I couldn't stand Glee for instance because of how auto-tuned the voices were.

toast0 · on Oct 16, 2017

> I couldn't stand Glee for instance because of how auto-tuned the voices were.

This was pretty obnoxious when I noticed it; I don't mind if they use auto-tune for a one-off musical episode of a show, but for a show where every episode is a musical to use it (and not very subtly) was very off putting.

superpope99 · on Oct 16, 2017

I have a little experience with the recording process, having been involved in the production of a contemporary a cappella album, and I think most people wouldn't believe how much stuff is auto-tuned. To my ear, if you listen to something like the Pitch Perfect soundtrack, a lot of it sounds quite obviously pitch corrected, but I will play it to musical friends and they are often surprised when I tell them there's some serious auto-tune going on.

Outside of the a cappella domain, I believe it's pretty prominent in most pop music, but my understanding is that the technology's gotten to the point where 95% of the time you can't tell apart machine correction from good tuning of the vocalist.

Splines · on Oct 16, 2017

It's like CGI in movies - when it's done well you don't realize that it's being done at all.

sliverstorm · on Oct 16, 2017

And like CGI in movies, you get the best results when the source material only needs slight enhancements.

copperx · on Oct 17, 2017

True. Inserting anything that doesn't exist in the first place, like a plane, train, animal, or monster is incredibly jarring for even the most casual movie watcher. It's a bit sad that we can't create convincing things out of thin air using CGI in 2017.

exikyut · on Oct 17, 2017

An odd source for related content, but you reminded me of this which I saw a few months ago: https://imgur.com/gallery/BdfNEko

ashark · on Oct 16, 2017

Just about all childrens' shows with singing use it, I think. Some on harsher, obnoxious settings (Daniel Tiger's Neighborhood), some still very noticeable but toned down a bit (MLP). Someone could probably put together a spectrum of auto-tune settings samples from such shows.

I hate it because it's throwing off kids' sense of what a natural human singing voice sounds like. It's photoshop for the voice and is similarly damaging to one's image of self and of others.

ycombobreaker · on Oct 16, 2017

Steven Universe isn't strictly a kids' show, but it has some noticeable imperfections in many songs which I think adds a sense of honesty without hurting the performances. I hate the perfection in Daniel Tiger songs, they will contribute to a generation of kids that cannot stand the sound of their own voices.

marzipan · on Oct 16, 2017

Here is Billy Joel using Auto-Tune live in 2007, with overly aggressive settings:

[0] https://www.youtube.com/watch?v=zmhTmKD6rXc

Same man, same song, in 1978, 1986, and 2000:

[1] https://www.youtube.com/watch?v=fcLV6gE6THg [2] https://www.youtube.com/watch?v=Hwwu5vYSShY [3] https://www.youtube.com/watch?v=lQkXdRKMB9c

And now in 2015:

[3] https://www.youtube.com/watch?v=ovGwkScJRZ4

Now, while it's possible that he simply became a better singer between 2007 and 2015, subtler auto-tuning of the type conventionally used in pop production generally just makes the voice sound cleaner, thicker, more polished. Note that his style has also changed to favor staccato phrasing so as to limit stairstepping artifacts.

tarsinge · on Oct 17, 2017

> makes the voice sound cleaner, thicker, more polished

I think here you may be confusing Auto-tune with "modern" voice processing, which while happened together are quite different. Modern pop singers are overdubbed ridiculously with added distorsion, resulting in a "massive" (and slightly "robotic") lead voice.

Edit: You can clearly hear this "saturation" in the song Hello by Adele (https://www.youtube.com/watch?v=YQHsXMglC9A), especially in the chorus. Compare with her natural voice: https://www.youtube.com/watch?v=-yL7VP4-kP4

DonHopkins · on Oct 16, 2017

Let's all chip together and buy Meat Loaf a copy of Auto-Tune.

https://www.youtube.com/watch?v=0L-D6zj5U1g

DonHopkins · on Oct 16, 2017

Maybe not subtle, but brilliant:

Songify / Auto-Tune The News / Schmoyoho: https://www.youtube.com/user/schmoyoho

No Fair - Trump ft. Eric & Don Jr. | Songify This: https://www.youtube.com/watch?v=8NkNcwzGCfQ

Obama Mic Drop: 1999: https://www.youtube.com/watch?v=NbqtAuT4zbk

Nition · on Oct 16, 2017

This bit from the new Beauty And The Beast film sounds like Auto-Tune in a noticeable way to me: https://youtu.be/6tMoE8siNo8?t=166

I'll try to walk you through what I'm actually hearing that makes me think there's Auto-Tune there:

- 2:50: Subtle, but as the first "oh" starts, it sounds like it quickly jumps from a slightly lower note to the correct one.

- 2:58: The word "part" has a classic Auto-Tune sound. The real recording probably went slightly off pitch as the note was held and Auto-Tune has made it perfect and a little robotic.

- 3:07: "Where" has a similar sort of robotic pitch slide sound as it starts as the "oh" did earlier.

- 3:13: "Discover" has a sort of a glitch in the middle as Auto-Tune tries to track across the 'k' sound in the middle of the word.

As someone else already said, songs in TV show Glee used it all the time in a fairly obvious (but still subtle compared to intentionally sounding Auto-Tuned) way.

taco_emoji · on Oct 16, 2017

I suspect this is one example: https://www.youtube.com/watch?v=BYmn4E-kPSo

The lead singer's voice just sounds so thin, and unnaturally on-key (no vibrato).

I think you'd be hard-pressed to find verifiable examples because those higher settings are probably mostly used to cover up poor singing, and so nobody involved is going to volunteer that information.

superpope99 · on Oct 16, 2017

This is definitely auto-tuned, and I would guess that 90% of auto-tune goes unnoticed as it can be applied a lot more subtly than this.

criddell · on Oct 16, 2017

As someone who isn't a fan of vibrato, I've always suspected that some singers use it strategically to cover up pitch issues.

FWIW, I don't think modern auto-tune plugins have a problem pitch correcting singers using vibrato.

vkjv · on Oct 16, 2017

Melondyne is great with this. Complete lack of vibrato is usually an obvious sign.

I think auto-tuners get a bad rap. As an amateur musician, subtle corrections in post-production can save me days of recording to get the right take.

vonseel · on Oct 17, 2017

I didn’t listen to much of this, but the two seconds I heard “I guess it’s all the..” there’s definitely clear artifacting from either auto-tune or compression on the word “all”. And I’m listening on an iPhone.

Not a pro editor, but I know my shit.

picodguyo · on Oct 16, 2017

Bon Iver uses it rather artfully in several of his tracks.

Ex. https://www.youtube.com/watch?v=tZYVJlhnqxQ

EADGBE · on Oct 17, 2017

Yeah, I've used it on "live" recordings with saxophone* to cut out the number of overdubs necessary. On subtle settings, it's impossible to hear the effect of it.

*Sax was the only non-fretted/pitched instrument in that recording.

dimmuborgir · on Oct 16, 2017

One-handed Backhand Boys! :)

https://www.youtube.com/watch?v=t1b9INDFYfI

LeoPanthera · on Oct 16, 2017

Pretty much anything by "Owl City" is fairly heavily autotuned, but not into Cher/T-Pain territory.

Nition · on Oct 16, 2017

Technically not Auto-Tune, that's a competing product called Melodyne.

LeoPanthera · on Oct 16, 2017

I stand corrected. Are there any fundamental differences?

tarsinge · on Oct 17, 2017

Melodyne is way more advanced : you can isolate individual notes and choose the pitch yourself, whereas with auto-tune you choose a scale and it automatically correct to the nearest note. See this link posted in another comment : https://www.youtube.com/watch?v=9FScFKuXXM0

Nition · on Oct 17, 2017

The interface is different with some different parameters. I don't know how the actual mathematics involved compares, but the sounds you'll get out of it are a little different. It's still a good option to listen to to hear what pitch correction on vocals might sound like in general.

vonseel · on Oct 17, 2017

Lol, obviously didn’t read the article which had an entire section dedicated to how auto-tune has such a successful brand that it’s used in speech the same as Kleenex, Google, etc

Nition · on Oct 17, 2017

I figured whether it was meant in a genericized way or not, it's still useful to point out the specific product used to someone who's wanting to hear what it sounds like, since they each have their own sound to some extent.

pcsanwald · on Oct 16, 2017

The ubiquity of auto-tune is particularly interesting because most people don't seem to have a lot of pitch sensitivity. Tons of very famous recordings are slightly out of tune and no one seems to complain, ever.

I'd imagine its popularity is more due to convenience rather than demand.

amelius · on Oct 16, 2017

I think slightly out of tune often sounds better (more natural) in all genres except perhaps classical.

mrob · on Oct 17, 2017

The one genre I can think of where perfect tuning is essential is barbershop music. The defining feature of the genre is chords sung with just intonation and no vibrato, which makes it easy to hear if any of the singers are out of tune. See:

https://en.wikipedia.org/wiki/Barbershop_music#Ringing_chord...

tshaddox · on Oct 17, 2017

A lot of barbershop albums, probably most recent ones, are pretty heavily pitch-corrected. It doesn’t bother me that much, but I almost always prefer live barbershop recordings.

contravariant · on Oct 17, 2017

Depends what you mean by "out of tune". How good a chord sounds is determined by the ratio of frequencies, not if all the notes fit 'perfectly' in some scale. In fact you can't even create a scale where all chords are perfectly in tune.

pcsanwald · on Oct 17, 2017

all tuning is relative, of course, but my comment is thinking about singers or horn players that play consistently sharp, for example.

pcsanwald · on Oct 16, 2017

agreed, but even in classical music it doesn't really bother me that much sometimes, some of my very favorite string players have occasional intonation issues (who doesn't?). And think of all the jazz records with an out of tune piano!

EADGBE · on Oct 17, 2017

Peterson's "Sweetened Tunings" has made a whole sub-genre out of this.

the_cat_kittles · on Oct 17, 2017

yea, sometimes i think the lack of perfection helps you focus on the bigger ideas. like how making 2d plots with the xkcd style kind of tells your brain to ignore little noise in the line. same with tuning and tempo and whatnot- not every not is perfect, so your ear ignores the little imperfections. of course, if there is too much imperfection, signal to noise ratio gets out of whack. but i think this is part of the reason lots of music has drones, melocially, or texturally (think indian classica, jazz cymbal or brushes, etc) ...it kind of thresholds alot of non-musical noise.

smelterdemon · on Oct 17, 2017

I had always assumed that Auto-Tune had evolved from or was an advanced form of Vocoder (likely from the similar "robot" effect extreme applications of Auto-tune gives).

I've been working on a relatively simple real time music/audio processing project on an Arduino (identifying tempo and using it to create interesting lighting effects for a Halloween costume) and it's an interesting challenge. Extracting any kind of useful information about the underlying musical structure from polyphonic audio is an incredibly hard problem. Add to that limited hardware and the kind of sampling rate you need to capture music (upwards of 40kHz if you want to capture everything you can hear) and you have to get creative.

soylentcola · on Oct 17, 2017

It's definitely a challenge with something like a microcontroller. My coding skills are borderline nonexistent (but then again, that's part of why I'm experimenting with Arduino in the first place) but I've messed around with using some FFT code published by others to get audio to affect some LED strips. So far the results have been mixed.

pizza · on Oct 16, 2017

What he (Andy, or was it someone else at Antares? I forget..) told me is that raspy heavy metal vocals are the one type of vocal that doesn't really work with Auto-Tune that nonetheless constantly get requested to get to work

purequest · on Oct 16, 2017

wondering if anyone has listened to Guns N Roses live this past year or so. Axl Rose's voice sounds amazing, considering his age, style of singing, and his well documented hard living. Having listened live at coachella and watched online, im wondering if its possible they are doing it live in real time now.

renaudg · on Oct 17, 2017

Of course it's being done live :

http://www.antarestech.com/products/detail.php?product=Auto-...

Also as a $349 rack device : https://www.amazon.com/Tascam-Producer-Processor-Antares-Aut...

notaboutdave · on Oct 17, 2017

Singing live with autotune is standard practice for most pop singers.

https://www.youtube.com/watch?v=rczu_Qc7CkA

marssaxman · on Oct 17, 2017

It is certainly possible - Tascam has been selling a live effects processor based on the autotune algorithm since 2011.