Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Acoustic Keyboard Eavesdropping (github.com/ggerganov)
218 points by taubek on May 16, 2022 | hide | past | favorite | 61 comments


As each surveillance technique falls in price, it can be layered with other signal sources.

Today, gr-tempest can recreate monitor images from HDMI RF emissions, using a low-cost SDR, https://twitter.com/markopolophys/status/1459911025642414086. GNU Radio conference video: https://www.youtube.com/watch?v=QRycX0M0H4s. TEMPEST techniques have been known for decades, but only recently became accessible to pen-testers via OSS code and low-cost hardware, https://en.wikipedia.org/wiki/Tempest_%28codename%29 & https://en.wikipedia.org/wiki/Van_Eck_phreaking.

In 2024, 802.11bf Wi-Fi 7 Sensing with mmWave radar could see through home/business walls to monitor keyboard typing, human heartbeats and other activity. While attackers can use custom radio firmware today, WiFi 7 will bring "X-Ray Vision" to consumers, ready or not, unless nation-state spectrum regulators intervene. https://news.ycombinator.com/item?id=30172647. If a neighbor buys a Wi-Fi 7 Sensing device, they could monitor human activity through your walls.

Intel (2020) presentation on Wi-Fi 7, slides #19 and #20 describe Wi-Fi Sensing, https://www.intel.com/content/dam/www/public/us/en/documents...


Incidentally, it sure would be a shame if using plaster for home interiors came back into fashion, since the metal lath acts as an effective Faraday cage.


Thanks for the reminder :) Would also need grounding and ideally, continuity across corners and joins. Some photos of metal lath: https://inspectapedia.com/interiors/Plaster_on_Expanded_Meta...


Wouldn't the wire mesh need to be spaced near the size of the wavelength (mm-wave, so the wire mesh is probably way too spaced out)?


Wi-Fi 7 Sensing works at each of the different frequencies (2.4Ghz, 5Ghz, 6Ghz, 60Ghz). The lower frequencies have lower imaging resolution, longer range/penetration and can be blocked with less dense mesh.

EMF shielding materials readily available at hardware stores include insect screening metal mesh and aluminum vapor or radiant barrier. There's also drywall which includes RF shielding, but it's expensive. DIY pointers: https://mpkb.org/home/special/emf/whitezones/faradaycage


I was CTO/head of R&D at a company where we were sensing where someone touched an object based on sounds. I was able to semi-accurately position a finger tap on a glass phone surface. That was in 2016.

Caveats:

We used contact microphones. Taking the data out of the air from an acoustic signal would be much harder, but not impossible in a quiet room with some fancy DSP.

There is a symmetry such that tapping the same distance from corner A is indistinguishable from tapping the equivalent position in corner B. Still, for a 4 x 4 PIN entry that's useful extra info.

It depends on the physiology, finger length and nails of the operator, and how they hold the phone. The fact that this may also be a unique identifier of the operator should not be a surprise.


Any recommendations on contact microphones, e.g. would a guitar pickup mike work?


The sensors are piezo micro-discs. They come in all shapes and sizes these days, adhesive piezo ribbons and much more. To get a good signal you really need an op-amp or jfet stage too.


I have an extremely loud keyboard[1] and a decent quality Blue Snowball[2] microphone (right next to each other, at that), and it can't predict anything reliably. This is quite surprising, as the spacebar I could probably guess by ear myself.

[1] https://www.amazon.com/SteelSeries-Apex-Mechanical-Gaming-Ke...

[2] https://www.amazon.com/Blue-Snowball-Condenser-Microphone-Ca...


I just gave it a try myself. My mic isn't as nice (Logitech C920) but my keyboard is also pretty loud (Das Keyboard Prime 13). I gave it two tries and it didn't even get within an order of magnitude on the number of keystrokes, let alone what the keystrokes are.

I'm not dunking on them. This is a really really hard problem. My comment is more that I'm not worried about this as a viable threat just yet.


Hey, author here - great to see this posted on HN again. Let me know if you have any questions.

I notice comments that the approach is not working for various setups. I recommend starting with the most simple test - using Keytap [0] and training it with just 2 keys (for example 'q' and 'p' on QWERTY keyboard). You should get nearly 100% recall rate with "Average CC" above 0.80 for each of the 2 keys. If this is not the case, then Keytap will most certainly not work for your setup.

[0] https://keytap.ggerganov.com


Prior research in 2004: https://www.semanticscholar.org/paper/Keyboard-acoustic-eman...

"We show that PC keyboards, notebook keyboards, telephone and ATM pads are vulnerable to attacks based on differentiating the sound emanated by different keys. Our attack employs a neural network to recognize the key being pressed. We also investigate why different keys produce different sounds and provide hints for the design of homophonic keyboards that would be resistant to this type of attack."


ATM keypads are an extremely expensive component that you replace as a complete module, made out of a couple of chunks of very thick stainless steel and plastic with the PCB in the middle.

They are specifically designed so that all the keys sound exactly the same, and individually tested at the manufacturer. This is something they've already thought about, back in the early 1990s.


A week ago here on HN, someone posted that they had a program that automatically muted the microphone on keyboard input. That should be effective against this being used during teleconferencing.

(I'm sorry I couldn't find the post again)


There is are these from Jan 2021 which is a little bit more than a week ago.

https://news.ycombinator.com/item?id=25686201 https://news.ycombinator.com/item?id=25644828


This is very cool. Back in college, we had keyboards that were overused and of not very high quality. The wear and tear of several students pounding on them made the sounds of many keys distinguishable. The administrators had a simple password for root. It had two spaces and two number keys in it. Totally 7 letters long. I made some educated guesses by listening to the sounds while they typed it in and then used those to narrow down my brute forcing program (a C program I might add). That got me in in a few hours.


Does this mean we can now make passive "wireless" keyboards without batteries by simply using a microphone connected to the computer?


Yes. This reminds me how you can turn a macbook/laptop into a touchscreen [0].

[0] https://www.anishathalye.com/2018/04/03/macbook-touchscreen/


Okay, now imagine that you're remotely pair-programming with someone (over Zoom or Skype or Teams or whatever). You get to see and hear a lot of the keystrokes and the characters they emit.

Then, at some point, you or your peer are prompted to enter a password. The password field shows up as all bullets. But... can you still identify the password based on the audio feed?


Yes - but you don't even need their output. I did this a dozen or so years ago at a coworking space, where I trained it on my officemate's keyboard assuming they were typing Ruby code, and then was able to guess their passphrase pretty quickly using the trained model.

I manually fudged spacebars and enters because they're accoustically obvious, and played around with punctuation keys. Generally the timing for fingers to move from one key to the other was where I was finding the strongest signal.


> I did this a dozen or so years ago at a coworking space, where I trained it on my officemate's keyboard

That must be a fun way to type in someone's password to their computer when they lock it and walk away to get some coffee...


I doubt the audio fidelity over a video call would be high enough to do this same thing, though. Esp. since all of those services have noise reduction stuff in place by default.


Doesn't work on either of the cherry mx keyboards I have and I haven't heard of anyone else having success with this.


I’m guessing it doesn’t need to be all that accurate for certain use cases.

Even just knowing the length of the password, estimating which keys in the sequence are capitalized (if Shift behavior is fairly easy to pin down) and being able to pin each key down to 5 possibilities would make a 20 character password trivial to crack. Right?


Same for me, it was laughably bad.


Some movie writer is bookmarking this for a future script.


Some startup is bookmarking this for a future Zoom/Teams plugin for your boss to buy to ensure your attention at all times…


Fork it and change this to -

> algorithm improvements and better n-gram statistics

GPT-3

And you got a startup going.


Solution: get a keyboard loud enough that you make whatever mic is listening in clip because it's so loud.

If you can't get key switches loud enough, I made a little Emacs Lisp snippet that plays a tone on every keypress. Example is for macOS, but adaptation to *nix should be trivial. https://gist.github.com/ashton314/4ca20e6e040f07aef58a05f42d...


You'd still be able to detect timing from a clipped audio signal, same with the tone.

Better to just blast the airhorn while you type.


Injecting noise is a good idea: a solenoid clicking an identical key at a jittery interval would work. More work than an airhorn though.


I suppose it is time to add white noise generation to my keyboard...


White noise wouldn't be nearly as effective as cherry red noise.


Perhaps the keyboard could have an internal speaker that provides carefully crafted additional noise to make the keys sound more uniform (like each-other), or to sound more random (different each time).

However, if the counter-noise would get triggered first by a key press then it would not be able to mask the initial part of the key noise when the finger strikes the key surface before pressing it down. Detection using a microphone would have the risk of false positives, so maybe a new key mechanism also including capacitive touch/proximity sensors would be needed.

And it would be ineffective against attacks that model key stroke patterns temporally.


from the readme section:

"This is what mechanical keyboard users deserve" -- super guy

Ouch


This should still work on quiet keyboards--it just requires a bit more work in setting up a listening device. For example, you could hide a microphone under the keyboard or attach it to the table it's sitting on (and maybe adjust the model accordingly).

I need to break out my relay board to see if the sound of the relays clicking mitigates attacks like this: https://youtu.be/6hMOGKTudcg (see it in all its clicky glory!)

As long as the click of the relay happens fast enough--and I add some sound dampening to the keyboard (which was the opposite of what I did for that test video haha)--I bet it would render this kind of attack useless.


If you're going to go through the trouble of placing a microphone under someone's table or keyboard, you mind as well just install a keylogger. It would be much more likely to succeed given the risk.


Zoom


I've had the suspicion that an attack like this has already been used in the past on a few Twitch/YouTube streamers.

They possess a large corpus of training data (heck, some of them play typeracer on-screen.), and would no doubt have a few with quite audible mechanical keyboards near decent recording setups.

Then again, good security hygiene still mitigates this. (Avoiding password re-use, using 2FA where available etc.)

You wouldn't be emitting the sounds of typing your password while using a password manager either... (Well, except for the unlock password. Also there's something to be said about clipboard implementation across different operating systems.)


Does this assume that the keyboard is qwerty, or is it able to identify typing patterns regardless of the keyboard layout? I use Dvorak and I couldn't get any of the demos to work for me, but that may just be my fault.


I skimmed the readme but a chunk of it is admittedly over my head, however I would expect something based on letter frequency and n-grams to work regardless of layout, while something that relies on the acoustics of the individual keys to be layout-dependent.


Based on the readme it seems to need training data, so assuming you train it with the same layout you want to use later to capture from. i don't see why it wouldn't work.


from the web-page:

    It does not require training data - instead it uses statistical information about the frequencies of the letters and n-grams in the English language.
and from this it should also be noted that it won't apparently be able to extract passwords, as least those which aren't "n-grams in the English language".


This would be a good argument against the xkcd philosophy towards passwords.


Unless when typing your password you deliberately use different fingers than you would when typing normally


This could, and should be mitigated through the use of real time keyboard remappings. An image of a randomized keyboard layout is shown to the user so they can type their password using different keys. Enjin Wallet on IOS does this (obviously for reasons other than acoustics), but I have not seen it used anywhere else.

Preliminary looks into remapping keyboard inputs in real time looks annoying to make portable, but doable.


It would be a usability nightmare on a physical keyboard.


It makes typing passwords slower, and you would have to be proficient at touch typing. Surely a worthy trade off in many situations.


Didn’t it read into it too much, but wouldn’t it work way better with some sort of timing attack post processing?


Waiting for this to become the next TV trope.


It's somewhat similar to https://en.wikipedia.org/wiki/Van_Eck_phreaking that's featured in Cryptonomicon.


I think audiences would reject it as seeming too improbable. Even by modern TV show standards...


“I wrote a GUI in Visual Basic to back-trace the killer’s IP”


This technique can work with radio too.


has anybody tried this? Does this actually work? Does it depend on what language is being typed?


Not this specific code, but I have done accoustic keyboard analysis in the past with success. The thing I wasn't able to get working was multiple keyboards in a single room - in theory it should be doable, but the signal processing was (and still is) far beyond my rudimentary skills.


It should be possible to isolate each separate keyboard with a microphone array (but yes, signal processing...)

Which makes me wonder: With a sensitive enough microphone array, might it be possible to separate out the locations of each individual key? At the very least, it seems like it might be possible if it's coming from the right or the left side of the keyboard.


Yeah, that was the rough plan, start with a pair of mics and triangulate out - I was hoping to iterate all the way up to a pair of laser microphones on different window panes, and on paper I think it should be all doable. I'd assume all of the organizations that have budgets for things like this already have it, but it was a fun project to hack on between gigs.


I tried it on a laptop and separately on cherry mx reds with no real success (but maybe I wasn't typing for long enough)


From: https://www.newyorker.com/magazine/2015/11/23/doomsday-inven...

> “I think political systems will use it to terrorize people,” Hinton said. Already, he believed, agencies like the N.S.A. were attempting to abuse similar technology.

> “Then why are you doing the research?” Bostrom asked.

> “I could give you the usual arguments,” Hinton said. “But the truth is that the prospect of discovery is too sweet.” He smiled awkwardly, the word hanging in the air—an echo of Oppenheimer, who famously said of the bomb, “When you see something that is technically sweet, you go ahead and do it, and you argue about what to do about it only after you have had your technical success.”


Hopefully this sentiment doesn’t kill us all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: