Scale by Design: 3T Emails and Growing

June 10, 2020
Written by
Devon Jones
Opinions expressed by Twilio contributors are their own

Scale by Design: 3T Emails and Growing

Twilio SendGrid has hit a new milestone in our mission to provide customers with the most trusted communication platform. In May 2020, we surpassed 3 trillion processed emails.

We want to take a moment to share in celebrating this accomplishment with you, as well as how we’ve designed for this scale, and how we ensure senders big and small get their email delivered with Twilio SendGrid.

But first, let’s marvel that these milestones come faster and faster:

2,956: days it took Twilio SendGrid to process our first trillion emails 682: more days to process 2 trillion emails 464: more days to process 3 trillion emails

Building and operating a high-transaction system like this at scale takes an immense amount of design, planning, validation, attention to detail, load testing, and infrastructure coordination. Our engineers have been hard at work to ensure our systems can hit this scale while simultaneously offering 4 9s of uptime and maintaining a low median delivery time of 1.9 seconds from request to mailbox provider.

Our scale is by design

As recently as 2016, our peak sending days of Black Friday and Cyber Monday were all hands on deck affairs, where the engineering staff needed to be actively engaged over the whole of Thanksgiving weekend to ensure these record-breaking send days were successful. This was grueling work over those weekends for the team to ensure a good experience for our customers.

Since 2016, we’ve invested to transform our approach to scale and the payoff is clear. Over Thanksgiving 2019, we had near zero alerts because the system was working perfectly as it delivered a peak throughput of 315 million emails per hour.

Today, we process over 2 billion emails every weekday, and regularly surpass 3 billion. The last time we processed fewer than 2 billion emails on a weekday (except Christmas and New Years Day) was July 5, 2019. We’ve been processing over a billion a day as far back as January 6, 2018.

Twilio SendGrid email volume has increased 70% year over year since Black Friday 2016. Due to this uncharted growth, we decided to be intentional about building the most reliable, scalable email infrastructure available. Rather than rely on small tweaks and manual human intervention to manage the massive volume of email we process, we decided to build for the future.

Scale doesn’t just benefit big senders

Twilio SendGrid supports senders of every size, and because of this, fair queuing has always been of great importance to us. Supporting enormous senders always had the risk of starving our smaller senders as our infrastructure pushes out a 100M send. Our infrastructure doesn’t allow that to happen. Fair queuing is one of the critical principles of our mail pipeline. At the core of our pull-based architecture is the SendGrid Scheduler, which was designed from day one to minimize the end to end time for all customers, but also to provide fair treatment of each email. This in turn helps maximize our throughput.

Planning for failure

Another major strength for us is our data-center strategy. Twilio SendGrid processes 100% of the email we deliver in our three data centers in the US: East-Coast, Midwest, and West-Coast. Having three data centers is what enables us to have such high availability. We always keep one of the three data centers quiet, so in the event of any kind of disruption in the other two, our third DC is ready to immediately take traffic. Fiber lines get cut, servers, even racks fail, but with three DCs ready to take traffic at any time, we can handle major disruptions without our customers seeing even a minor blip.

Connecting directly

Being in our own data centers provides our customers with yet another benefit that most other ESPs can’t provide: direct network peering relationships with both customers and inboxes. Because we deliver so much email, and at such high burst rates, this gives us access to networking relationships that generally only ISPs have access to.

This year we have established direct peering relationships with AWS and Yahoo. For AWS, this will give our shared customers better reliability and lower latency access to our service. For Yahoo, this gives us unparalleled access to their inboxes. We can access them over our direct peering relationship, and in the event of a line failure, we can still fail back to the public internet. When our infrastructure has failures, we fail *back* to the networking option that for other ESPs is their only option. This gives us not only unparalleled reliability, but also lower latency, and higher throughput for all customers.

What’s coming next?

We continue to expect massive ongoing growth. The path to 4 trillion emails and our first 5 billion email day is both clear and even shorter than the runway from 3 trillion to 4. As we approach those milestones, we are executing on strategies that should continue to reduce customer latency, provide greater insight into customer sends, and continue to scale to meet these challenges to provide every customer with the best experience we can.

More is on the way. We are working closely with our cross-channel Twilio teammates to provide even more value to our customers while we continue to scale. Learn more about SendGrid’s email sending infrastructure, authentication, and delivery reputation by heading over to our delivery page.

Recommended For You

Most Popular

Send With Confidence

Partner with the email service trusted by developers and marketers for time-savings, scalability, and delivery expertise.