Ridding Email Phish at Massive Scale


April 17, 2019
Written by
Len Shneyder
Contributor
Opinions expressed by Twilio contributors are their own

Ridding Email Phish at Massive Scale

How Twilio SendGrid Ensures 99.97% of 50B Monthly Emails are Phish-Free

Twilio SendGrid processes over 50 billion emails every month, meaning we touch over half of the world’s unique email users on a rolling 90-day basis. With such massive scale and reach, it’s imperative that we protect recipients’ guarded information and credentials from dangerous phish.

A platform’s security and ongoing battle against bad actors only becomes an issue when defenses fail. But open platforms like Twilio SendGrid and other public cloud providers are under attack every day of the year.

In fact, 83% of InfoSec professionals said they experienced a phishing attack in 2018, an increase from 76% in 2017. And with the average cost of a phishing attack for a mid-size company in the neighborhood $1.6 million, it can make or break a business that doesn’t have the necessary security protocols in place.

Twilio SendGrid’s Inbox Protection Rate measures the success of its compliance efforts to prevent malicious email from reaching SendGrid’s approximately 2 billion email recipients.
As of March 31, 2019, SendGrid achieved a 99.97% legitimate email rate across all of its outbound mail flow.
By measuring how successful our compliance efforts are, we are keeping track not only of our success in terms of deliverability, but more critically, the potential risks we face and the impact it has on the entire digital messaging ecosystem in the due course of operation.

Understanding both the good and the bad, and measuring our efficacy provides a level of transparency to our customers, and more importantly their customers.

The anatomy of a phishing email

It’s important to understand the difference between phish and spam. Spam describes unwanted mail—this could be something you signed up for or a poorly targeted campaign. In some cases, spam could be legitimate email but the opt-in practices were lacking. In most cases, spam isn’t sent with the intention of defrauding the recipient or compromising their personally identifiable information (PII).

Phish, on the other hand, has but one purpose and that’s to gain access to sensitive information such as passwords or social security numbers, deliver malware, redirect unsuspecting victims to ransomware sites, and any other manner of compromise. The idea is to play off the fear and curiosity of the individual and drive them to unwittingly disclose information for the sole purpose of exploitation.

Phishing attacks take many forms: from poorly constructed emails with attachments, to highly sophisticated messages that leverage links and images from legitimate content hosted by the spoofed company online, with a single call to action that could be the compromised link.

Phishers exploit hosting companies as part of a complex game of shadows—registering cousin domains such as @yah00.com, @payypal.com, @applle or @go0gle.com to give the appearance of legitimacy. This further complicates the efforts of security personnel to stop these costly attacks.

Misspellings, poor use of English appearing to be written by a non-native speaker, and strange "from addresses" are all signs that the email may not be from who it claims to be, but it can take a trained eye to differentiate between the two. Our job is to secure our platform from abuse and by doing so, we are helping preserve the trust and authenticity of the entire mailbox ecosystem.

How Twilio SendGrid uses machine learning to identify and stop phishy behavior

Eliminating phish and improving email quality is far beyond a manual process for an email provider of our scale. Maintaining a phish-free mail flow requires both a technical understanding of how to properly architect internet scale delivery systems and the attack vectors employed by bad actors attempting to exploit our scale.

Twilio SendGrid developed a machine learning system called Phisherman that was designed from our vast knowledge of abusive email content to catch phish in our mail pipeline. Phisherman utilizes a trained TensorFlow neural network to determine the probability that any given piece of email is phish using genericized word-to-vector comparisons to identify patterns in large data sets that are then compared against a carefully crafted model designed to isolate phish from good mail.

Sending over 50 billion emails per month, we process enough good and bad email to have a highly intelligent training set suitable for machine learning. This can be incredibly difficult for smaller companies that don’t have enough data to train their models. With larger training sets, more sophisticated machine learning becomes possible and we’ve been able to train (and retrain as phish changes over time) our neural networks to more accurately flag and shut down phishing attempts.

But machine learning systems are only as good as the humans who train them. Our Compliance Agents review all caught phish in order to identify any false positives caught by the system, thereby refining Phisherman with continued intelligence and taking the utmost care of our good senders that may have inadvertently been flagged.

Our scale gives us the ability to sample a vast array of mail, but it also means our systems have to be engineered in a manner that won’t bog down or negatively affect the legitimate email flowing through our system.

Increasing trust and transparency in the inbox

Companies don’t normally want to discuss their shortfalls—its not in their best interest. But’s important that we provide greater transparency around not only how data is used, but how systems are built to protect recipients. Our hope is that other senders will also share their rates, similar to the way that SaaS providers note and share uptime and availability as a measure of a cloud platform's stability and efficacy.

SaaS has enabled creative and clever thinkers to build powerful technologies, but unchecked, it has also enabled criminals to leverage massive scale to achieve global fraud. By setting relevant thresholds on the success of these systems, we can begin to have more honest conversations as an industry on problems that continue to grow in sophistication and scale rather than dwindling into obscurity.

We must come together as an industry to fight abuse and that begins with greater transparency. It’s the job of every company to police their technology.

To learn more about what Twilio is doing to further trusted customer communications across all of our channels, check out Twilio CEO, Jeff Lawson’s recent robocalling blog post. 




Inbox Protection Rate Methodology

The Inbox Protection Rate is a measure of email that transits Twilio SendGrid’s servers deemed to be legitimate, non-phishing email sent by legitimate businesses. The Inbox Protection Rate is not a measure of spam or how that email is received, since spam is subjective. In addition to analyzing outbound messages, Twilio SendGrid analyzes email bounces indicative of phishing and other forms of delivery issues.

Twilio SendGrid manually reviews suspended accounts to determine whether a sender has been phishing. Each account found to contain phishing content is terminated and tagged as phish. Twilio SendGrid then counts the sum of messages delivered via tagged accounts as phish, and incorporates the phish into its automated defenses to improve their efficiency, robustness and detection rate.

Learn more about Twilio SendGrid’s Inbox Protection Rate, which measures compliance efforts to prevent malicious email from reaching SendGrid’s email recipients.

Most Popular


Send With Confidence

Partner with the email service trusted by developers and marketers for time-savings, scalability, and delivery expertise.