Here’s how deepfake vishing attacks work, and why they can be hard to detect

https://arstechnica.com/security/2025/08/heres-how-deepfake-vishing-attacks-work-and-why-they-can-be-hard-to-detect/

Dan Goodin Aug 07, 2025 · 4 mins read
Here’s how deepfake vishing attacks work, and why they can be hard to detect
Share this

By now, you’ve likely heard of fraudulent calls that use AI to clone the voices of people the call recipient knows. Often, the result is what sounds like a grandchild, CEO, or work colleague you’ve known for years reporting an urgent matter requiring immediate action, saying to wire money, divulge login credentials, or visit a malicious website.

Researchers and government officials have been warning of the threat for years, with the Cybersecurity and Infrastructure Security Agency saying in 2023 that threats from deepfakes and other forms of synthetic media have increased “exponentially.” Last year, Google’s Mandiant security division reported that such attacks are being executed with “uncanny precision, creating for more realistic phishing schemes.”

Anatomy of a deepfake scam call

On Wednesday, security firm Group-IB outlined the basic steps involved in executing these sorts of attacks. The takeaway is that they’re easy to reproduce at scale and can be challenging to detect or repel.

The basic steps are:

Collecting voice samples of the person who will be impersonated. Samples as short as three seconds are sometimes adequate. They can come from videos, online meetings, or previous voice calls.

Feeding the samples into AI-based speech synthesis engines, such as Google’s Tacotron 2, Microsoft’s Vall-E, or services from ElevenLabs and Resemble AI. These engines allow the attacker to use a text-to-speech interface that produces user-chosen words in a voice tone and conversational tics of the person being impersonated. Most services bar such use of deepfakes, but as Consumer Reports found in March, the safeguards these companies have in place to curb the practice could be bypassed with minimal effort.

An optional step is to spoof the number belonging to the person or organization being impersonated. These sorts of techniques have been in use for decades.

Next, attackers initiate the scam call. In some cases, the cloned voice will follow a script. In other, more sophisticated attacks, the faked speech is generated in real time, using voice masking or transformation software. The real-time attacks can be more convincing because they allow the attacker to respond to questions a skeptical recipient may ask.

“Although real-time impersonation has been demonstrated by open-source projects and commercial APIs, real-time deepfake vishing in-the-wild remains limited,” Group-IB said. “However, given ongoing advancements in processing speed and model efficiency, real-time usage is expected to become more common in the near future.”

In either case, the attacker uses the fake voice to generate a pretense for needing the recipient to take immediate action. The narrative might simulate a granddaughter in jail urgently seeking bail money, a CEO directing someone in an accounts payable department to wire money to cover an expense that’s past due, or an IT person instructing an employee to reset a password following a purported breach.

Collecting the cash, stolen credential, or other asset. Often, once the action is taken, it's irreversible.

Shields down

The Mandiant post showed the relative ease with which members of its security team had in executing such a scam in a simulated red team exercise, designed to test defenses and train personnel. The red teamers collected publicly available voice samples of someone inside the targeted organization who had employees report to them. The red teamers then used publicly available information to identify employees most likely to work under the person being faked and called them. To make the call more convincing, it used a real outage of a VPN service as a pretense for the employee to take immediate action.

“Due to the trust in the voice on the phone, the victim bypassed security prompts from both Microsoft Edge and Windows Defender SmartScreen, unknowingly downloading and executing a pre-prepared malicious payload onto their workstation,” Mandiant said. “The successful detonation of the payload marked the completion of the exercise, showcasing the alarming ease with which AI voice spoofing can facilitate the breach of an organization.”

Precautions for preventing such scams from succeeding can be as simple as parties agreeing to a randomly chosen word or phrase that the caller must provide before the recipient complies with a request. Recipients can also end the call and call the person back at a number known to belong to the caller. But it's best to follow both steps.

Both of these precautions require the recipient to remain calm and alert, despite the legitimate sense of urgency that would arise if the feigned scenario were real. This can be even harder when the recipient is tired, overextended, or otherwise not at their best. And for that reason, so-called vishing attacks—whether AI-enabled or not—aren’t likely to go away anytime soon.