Podcast Sep 14, 2023

Root Causes 332: Acoustic AI-based Key Logging Attack

Researchers have built an AI model that can interpret keystrokes based on the sound of keyboard use over a phone or video call. Among other things, this technique can be used to steal passwords when the sound of logging in can be overheard. Join us as we learn about this new breed of credential harvesting.

Original Broadcast Date: September 14, 2023

Tim Callan · Root Causes 332: Acoustic AI-based Key Logging Attack

Episode Transcript

Lightly edited for flow and brevity.

Tim Callan

We have a guest today. We always love guests and today, our guest is Mark Williams. Mark Williams is Director of Product Management at Sectigo. How are you doing today, Mark?
Marc Williams

Hi, Tim. Thanks for having me. Hi, Jason.
Tim Callan

So, Mark, you are here to tell us about a very interesting new AI-based side channel attack. Is that right?
Marc Williams

That's right. So there's been a buzz around the security community over this paper that came out earlier this month and I think it's as fascinating as it is alarming. Researchers have been training AI to decipher passwords from audio. This audio has been recorded on phones and Zoom calls, and they've been testing it. So picture this. Every time your finger hits one of the keys on the keyboard, right, it carries a unique audio signature that's discernible from the other keys that you hit on your keyboard.
Tim Callan

So the AI can tell the difference between hitting G and hitting H?
Marc Williams

That's right. It’s impacted by things like the speed of the key press, the angle, even the previous key you pressed, or whether or not the shift key is pressed. All of these things impact that audio signature, right? So from that phone recording model, these researchers were able to get the AI to decipher passwords with a 95% accuracy and that's without even the use of a language model.
Tim Callan

So let me just play this back and make sure I'm getting it right. I'm sitting on a Zoom call with you and I type a password in for something and there could be an AI on the other end that's hearing my audio that then knows based on the sound, what password I just typed into my keyboard and now you have my password. Is that correct?
Marc Williams

That is right.
Jason Soroko

I think that what is interesting here, Tim, we've talked about credential harvesting of shared secrets innumerable number of times in this podcast. And I think we have to enter a new category of credential harvesting. We have all the social engineering models, we have key logging, right, which listens to memory on your Mac or Windows PC and now we've had audio before. We've had things that sometimes would listen to audio versions of older forms of second factor authentication. Those have come up. Those aren't new. But this is a whole new thing, where if you think about key logging, if you think about social engineering, you brought up the case of somebody recording audio passing between two people who presumably are known to each other but the way that this paper - Mark, you can help me here, you are the one presenting this - but to me, the way that this was presented was this is another form of malware, credential harvesting, where the malware itself is listening through the microphone, either of the smartphone or of the PC and it will train itself based off of what it is hearing. That's what the researchers did. And then with that training, it is able to actually decipher. So, Mark, help me out. Does the attack require the training or does the training happen before the attack occurs with the malware that is actually executing the listening phase of this?
Marc Williams

Some training, at least a fair amount of content I think is needed for a good degree of accuracy. It’s not going to take a 10 or 12 character password typed on a keyboard out of any context whatsoever and then know what it is but with enough exposure to someone typing, they can learn those audio signatures. And with a key logger, it's very precise. You know exactly which key is being pressed but with something like this, you don't need that kind of precision anymore and it doesn't just have to be snuck onto the system to pull from the mic channel, you can just drop a smartphone on someone's desk and leave it there for a little while and train it that way.
Tim Callan

Or like Jason said, imagine if I get my malware on your computer, I could be using your local mic to listen to what you're typing, and you have no idea that's even going on. And I'm just sitting there passively recording all your keystrokes just like a keylogger does.
Marc Williams

And even without perfect recognition of the password, hackers can simply apply an algorithm to guess the rest.
Tim Callan

Right. If there's a little bit of inaccuracy there, you can see where that's ok or if there's one digit that we miss, there's not that many combinations. If I have your entire password, except I don't know what one of the digits is, how many keys are there on your keyboard? That's the number of combos I have to try. It's just not very many. Right?
Marc Williams

That’s right.
Jason Soroko

Tim, it's something else here where you and I talk about this so often, and it's like that 95% accuracy mark is what gets me here. And I'm just blown away that we have to add a new category of credential harvesting on top of key loggers because this is essentially a new form of that using a different channel. But to me, we've talked about AI on this podcast before. We’ve talked about how limiting it can be and also how incredible it can be. In this case, Mark, I think this is the perfect place for AI and unfortunately, that 95% accuracy sounds, unfortunately, very accurate.
Marc Williams

And it's early days. So it's just going to get more accurate. I mean, this is just starting out.
Tim Callan

Let's just take a second to just tell people if you want to find this paper, the title of it is A Practical Deep Learning Based Acoustic Side Channel on Keyboards. Authors are Joshua Harrison from Durham University, Ehsan Toreini from the University of Surrey and Maryam Mehrnezhad from the Royal Holloway, University of London. So those are the authors that put this out. It's easy to find. And you can read. It's quite detailed, and they go into a lot of it.

One of the things I found was interesting was they talked about how one of the things that the algorithm had a tougher time with, not saying that it couldn't get it, but it had a tougher time with, was detecting the release of the shift key. So one of the things they suggested was, if I go shift I and release key that's different than if I go shift I release N, right. Two capital letters in a row, or three capital letters in a row are different from one capital letter in the row and I had a harder time getting that. So one of their recommendations was to mix in more capital letters. The more caps you have, the more likely you're going to foil the algorithm. That seems fine as a recommendation but, it's not solving the problem by any stretch of the imagination.
Marc Williams

No. Some have even suggested that the OS add random noises to keystrokes to throw these things off but that's just maybe a temporary Band-Aid. There are far better ways to mitigate this.
Tim Callan

So we obviously have talked about the idea that shared secrets are just kind of fundamentally vulnerable architecturally in their core. And I think we're seeing this again, right, Jay?
Jason Soroko

You got it. In fact, look, guys, in part two of the paper, I gotta read this off, just because it's so much fun. Academics being academics never want to take claim for something that they didn't invent, and they take great, great pain in their paper to say, look, we never we didn't invent acoustic decryption techniques or being able to decipher something that was being emanated in terms of a message. That's been going on for a long time and what they're really doing here is presenting a more effective and efficient, deep learning model. That's really what the paper was about. I gotta tell you, they were saying that in the 1950s British spies had been doing exactly this, of course, in an analog method, from Haglund encryption devices, which were a very similar design to Enigma machines. So can you imagine this has been going on since the 1950s, at the very least, so cool.
Tim Callan

Very cool. I think you and I did an episode about the Enigma machine if I could recall.
Jason Soroko

We sure did. We sure did.
Tim Callan

All right, well, wow. Wow. All I can say as wow. Last thing in the world I expected to read until I read it. So anything to add, gentlemen?
Marc Williams

That’s it. Just be on the lookout.
Jason Soroko

If you didn't think your passwords were safe now. I guess, Mark, this is where we get to plug - stronger forms of authentication.
Marc Williams

Absolutely.
Jason Soroko

Go ahead and check out stronger forms of authentication such as leaf certificates, key pairs for B2C type of applications with FIDO. Listen to Tim and I's podcast. We talk about these stronger forms of authentication. Guys, I never thought we'd even come up with just a whole new category of attack against typing in a password, but here we are.
Tim Callan

But here we are. Alright. Well, thank you very much. Thank you, Mark, for bringing this interesting news to us.

Contributors

Tim Callan

Chief Compliance Officer

Jason Soroko

Fellow

Marc Williams

Director of Product Management

About Root Causes

Tim Callan and Jason Soroko explore the issues surrounding digital identity, PKI, and cryptographic connections in today's dynamic and evolving computing world.

View All