this post was submitted on 02 Dec 2023

436 points (95.6% liked)

Technology

60123 readers

3173 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 2 years ago

MODERATORS

436

Boffins convert typing sounds into text with 95% accuracy (www.theregister.com)

submitted 1 year ago by kpw@kbin.social to c/technology@lemmy.world

147 comments fedilink hide all child comments

Researchers in the UK claim to have translated the sound of laptop keystrokes into their corresponding letters with 95 percent accuracy in some cases.

That 95 percent figure was achieved with nothing but a nearby iPhone. Remote methods are just as dangerous: over Zoom, the accuracy of recorded keystrokes only dropped to 93 percent, while Skype calls were still 91.7 percent accurate.

In other words, this is a side channel attack with considerable accuracy, minimal technical requirements, and a ubiquitous data exfiltration point: Microphones, which are everywhere from our laptops, to our wrists, to the very rooms we work in.

you are viewing a single comment's thread
view the rest of the comments

[–] helenslunch@feddit.nl 2 points 1 year ago (4 children)

Someone explain how this works? Doesn't make much sense to me how that's even possible.

[–] Pons_Aelius@kbin.social 24 points 1 year ago (3 children)

Because of different placement on the keyboard and different finger pressure, each key press has a slightly different sound.

The telling thing in this story is this

with 95 percent accuracy in some cases.

For some people (those with a very consistent typing style on a known keyboard) they were right 95% of the time.

In the real world this type of thing is basically useless as you would need a decent sample of the person typing on a known keyboard for it to work.

To go from keystroke sounds to actual letters, the eggheads recorded a person typing on a 16-inch 2021 MacBook Pro using a phone placed 17cm away and processed the sounds to get signatures of the keystrokes.

So to do this you need to have physical access to the person (to place a microphone nearby) and know what type of device they are typing on and for it to be a device that you have already analysed the sound profile of.

[–] agent_flounder@lemmy.world 7 points 1 year ago (2 children)

The article says

The researchers note that skilled users able to rely on touch typing are harder to detect accurately, with single-key recognition dropping from 64 to 40 percent at the higher speeds enabled by the technique.

Hm. Sounds like "some cases" are hunt and peck typists or very slow touch typists.

I don't know if training for each victim's typing is really needed. I get the impression they were identifying unique sounds and converting that to the correct letters. I only skimmed and I didn't quite understand the description of the mechanisms. Something about deep learning and convolution or...? I think they also said they didn't use a language model so I could be wrong.

[–] Pons_Aelius@kbin.social 6 points 1 year ago* (last edited 1 year ago) (4 children)

The problems is that even with up to 95% accuracy that still means the with a password length of 10 there is a 50/50 chance that one character is wrong.

A password with one character wrong is just as useless as randomly typing.

Which character is wrong and what should it be? You only have 2 or 3 more guess till most systems will lock the account.

This is an interesting academic exercise but there are much better and easier ways to gain access to passwords and systems.

The world is not a bond movie.

Deploying social engineering is much easier than this sort of attack.

[–] warrenson@lemmy.nz 5 points 1 year ago

"Hearing" the same password twice drastically increases the accuracy, however, social engineering is indeed the most effective and efficient attack method.

[–] 0xD@infosec.pub 4 points 1 year ago

If the password is not random, as they seldomly are, you can just guess the last, or even the last few characters of they are not correct.

[–] prole@sh.itjust.works 3 points 1 year ago* (last edited 1 year ago)

The world is not a bond movie.

Deploying social engineering is much easier than this sort of attack.

Have you never seen a Bond movie? Yeah they always have a gadget or two, but the rest is basically him social engineering his way through the film. And shooting. Usually lots of shooting too.

[–] agent_flounder@lemmy.world 2 points 1 year ago

I was thinking of this attack in terms of grabbing emails, documents, stuff like that. Or snippets thereof.

[–] prole@sh.itjust.works 2 points 1 year ago

I imagine it probably also uses an algorithm to attempt to "guess" the next letter (or the full word itself, like your phone keyboard does) based on existing words. Then maybe an LLM can determine which of the potential words are the most likely being typed based on the context.

I dunno if that makes any sense, but that's how I pictured it working in my brain movies.

[–] ILikeBoobies@lemmy.ca 3 points 1 year ago (1 children)

You don’t need physical access, just some malware that has access to the microphone

We would hope researchers “discovering” this wouldn’t have a production ready product as their proof of concept. So there is room from improvement but military contractors would love to invest in this

[–] Pons_Aelius@kbin.social 7 points 1 year ago (1 children)

You don’t need physical access, just some malware

Which you still need to have previously installed...

If the person has allowed malware to be installed just install a keylogger (which gives you 100% accuracy every time) rather than jump through more hoops with this.

[–] ILikeBoobies@lemmy.ca 4 points 1 year ago (1 children)

Different devices

I would have an easier time infecting someone‘s personal phone than a company machine

[–] Pons_Aelius@kbin.social -1 points 1 year ago (1 children)

You would, would you?

Well, I must be talking to a leet hacker then.

Ok, install malware on my phone.

[–] ILikeBoobies@lemmy.ca 3 points 1 year ago (1 children)

How did you get that from what I said?

[–] Pons_Aelius@kbin.social -1 points 1 year ago* (last edited 1 year ago) (1 children)

I would have an easier time infecting someone‘s personal phone than a company machine

What did you mean by this then other than you, personally, are skilled at such things and have system penetration experience?

[–] ILikeBoobies@lemmy.ca 3 points 1 year ago* (last edited 1 year ago)

Easier doesn’t mean easy but I can send you an email/give you a link

The company email server should block it and the firewall should block the website

Sample

Check out this game! https://play.google.com/store/apps/details?id=com.robtopx.geometrydashsubzero

But the page is actually

https://play.giggle.com/store/apps/details?id=com.robtopx.geometrydashsubzero

Knowing this doesn’t make me 1337

[–] helenslunch@feddit.nl 1 points 1 year ago

So basically if they know what type of hardware you're using, and have training on that type of hardware, then it works. It can't just be literally any keyboard, right?

That makes more sense.

[–] catch22@startrek.website 5 points 1 year ago

They'll have modelled the acoustic signals to differentiate between different keys. Individual acoustic waves eminating from pressing a key will have features extracted from them to identify them. Opimal featues are then choose to maximise accuracy, such as features that still work when the signal is captured at different distances or angles. With all these types of singsl processing inference models, you never get 100 percent. The claim of 95 percent is actually very high.

[–] 9point6@lemmy.world 4 points 1 year ago

Every key is unique and at a different distance to the microphone and therefore makes tiny differences in noise.

Knowing this, and knowing the frequency distribution of letters in language (e.g. we know "e" is the most common letter) and some clever analysis over a large enough sample of typing, we can figure out what each key sounds like with a statically high level of probability. Once that's happened it's just like any other speech recognition software, except it's the language of your keyboard.

[–] TootSweet@lemmy.world 3 points 1 year ago

This is just me kindof guessing off the top of my head, but:

Depending where the mic is in relation to the keyboard, it can tell to some extent the relative distance from the key to the mic by volume of the keypress.
The casing of the keyboard has a particular shape with particular acoustic properties which would make certain keys sound different than others. (Maybe the ones toward the middle have a more bass sound to them as opposed to more treble in the keys closer to the edges of the keyboard.)
The surface on which the keyboard sits may also resonate differently with different keys.
There may be some extent to which the objects in the room (including the typist and monitor, etc) could have reflected or absorbed soundwaves in ways that would differ depending on the angle at which the soundwaves hit them, which would be affected by the location of the key.
Some keys like the spacebar and left shift almost always have a stabilizer bar which significantly affects the sound of the key for most keyboards.
For human typists, there are patterns in the timing of key presses. It's quicker to type two keys in succession if those two keys are pressed by different fingers or different hands, for instance. Imaging typing the word "jungle", for instance. "J", "u", and "n" are all pressed with the right index finger (for touch typists). So the first three letters would be slower to type than the rest of the letters.
I'd imagine this method also allowed the program to take into account various aspects of human language. (Probably English in this case, but it could just as well have been another language.) Certain strings of consonants just never appear consecutively. Certain letters are less frequently used. Things like that. Probably the accuracy would have been lower if the subjects were asked to type specific strings of random letters.
It may also be that this particular experiment involved fairly controlled circumstances. They always placed the mic 12cm from the keyboard, for instance. Maybe they also used the exact same keyboard on the exact same desk with the exact same typist for all tests and training. And it sounds like they trained it on known text for a good while before testing the AI by asking the AI to actually discern what was typed. That's pretty perfect conditions that probably wouldn't be realistic for an actual attack. Not to minimize the potential privacy imacts of this, though. I'd fully expect methods like this to be more accurate for a more generalized set of cases.

Now, the researchers didn't sit down and list out all of these (or any other) ways in which software could determine what was typed from audio and compose an algorithm that accounted for all/most/some of these. They just kindof threw a bunch of audio with accompanying "right answers" at a machine learning algorithm and let the algorithm figure out whatever clues it could discern and combine those in whatever way it found most beneficial to come up with an (increasingly-more-accurate-with-every-training-set) answer. It's likely the algorithm came up with different things than I did that helped it determine which key(s) were being pressed.