Make Email Vocal with Free APIs


Posted on

The following is a guest post from Chris Ismael, Developer Evangelist from Mashape

If you’re like me, you read a lot of email. I thought I might change that up and have email read to me. In this tutorial, we will use SendGrid’s awesome Inbound Parse Webhook to get the subject of an incoming email, which will then be read out on a webpage. To convert text to voice, we will use a Mashape text-to-speech API. We will also use Firebase to activate the audio element on the webpage whenever we get a new mp3 from the text-to-speech API.

The diagram below shows the flow of events between SendGrid, Mashape, and Firebase. It does not necessarily define the exact boundaries of each service, but it’s a good sequence representation of events.

Vocalize email diagram

DEMO: To try a demo, head over to this page and send an email to inbound [at] ismaelc.bymail.in . When your email is received, the page will play a speech audio of the subject.

CODE: You can check out the code here from Github. You might want to open this in a separate tab so you can refer to it when we get into some code below.

Incoming Parse Setup

The first thing that needs to happen is to set up SendGrid to parse email and POST details to our node.js app. There are two main steps to set this up: (a) Point the MX record of our own domain/hostname to mx.sendgrid.net, (b) Associate that domain/hostname and the app receiving the POST in the Parse API Settings Page.

Since for (a) it would take about 24-48 hours for the MX record to propagate, SendGrid provides a free and fast way to test the Parse API by giving us a preset email/domain associated with our SendGrid username (e.g. inbound@<your SendGrid username>.bymail.in) . Check out the “5-minute approach” section from Scott Motte’s excellent post explaining this with a bit more detail.

For (b) my Parse API settings looks like below:
SendGrid parse settings

Call the Text-to-Speech API

Now that we have incoming email being parsed, we need to call the Mashape text-to-speech API using the Unirest node.js library. Mashape is an API Marketplace where you can find lots of cool APIs for your app. The particular API we’ll use for this tutorial is this one from my colleague Montana Flynn. To use this API, you need a Mashape account to get a key to be used to call the API.

If you check the API’s documentation page in Mashape, it has Unirest code snippets that you can use to call the API’s endpoint. Unirest is a set of lightweight client libraries to make HTTP calls. (Props to Nijiko Yonskai for creating this fantastic node.js Unirest port). It is available in a variety of programming languages. The one we’ll be using for this app is node.js:

Unirest in Node.js

After installing the Unirest node.js library (“npm install unirest”), you can start using it your node.js app as in the code below:

The code above is from the node.js app of the demo you’ve tried earlier. Let’s go through the Unirest bits first:

Line 1: Import the Unirest library
Line 17: Specify the Mashape base URL of the API as shown in the documentation screenshot above
Line 18: Set your unique Mashape key to the X-Mashape-Authorization header
Line 20: Execute the call and get the response. In this case the response we’re getting is a binary mp3 file

* Note that you are free to use other REST libraries to call the API from Mashape.

The other important part we’ve been ignoring so far is how this node.js app is set to accept POST calls from SendGrid. Without going into too much detail, we have essentially created an API endpoint in node.js that SendGrid can POST to. In the code above, we are using the Express node.js application framework to handle most of the API utility methods for us. You can check out this list of 40+ resources on how to create an API for different languages (node.js, PHP, Python, Rails, ASP.NET Web API, Java). The key takeaway here is that this project could have been implemented just as easily using other programming languages/frameworks, not just node.js

Connect the Dots to Firebase

The last step to get this working is to save the mp3 binary file in Firebase and get an automatic callback to the webpage to display the email and play the mp3. Firebase is a realtime backend service that empowers the idea that your data should drive the realtime behavior of your apps. To put it simply, with Firebase you get callbacks whenever the data stored in your “Firebase” changes. This simple yet powerful approach lets us focus our application logic and let Firebase do most of the realtime callback work.

To demonstrate this, look back at the code above and refer to the following lines:
Line 2: Import the Firebase library
Line 5: Initialize your “Firebase”. You can create your own Firebase and get the hostname from here.
Line 21: Send data to Firebase. In this case we are sending the binary data we got from the text-to-speech API response above.

The magic happens on the webpage, where we’re hooked up to any changes to our Firebase.

In the code above, we have simply subscribed to any changes/additions to our Firebase. This code gets called when data is appended to our Firebase. That Firebase data is also passed back to us (“snapshot”) in this webpage. The other lines of code here plays the audio (using HTML5’s audio element) and updates the subject and text fields in the webpage.

Heroku screenshot of the app in action

How to test and deploy this code

  • To test this code in your local machine, you can use a tunneling service like ngrok. This allows you to expose a public API endpoint that SendGrid will POST to, without having to deploy it in a cloud service. Just make sure that whatever endpoint you have set up in ngrok (or any tunneling service), it matches with the Url field in your Parse API Settings page.
  • To deploy this to the web, you can use Heroku, Nodejitsu, or any cloud platform service that supports node.js

With this simple demo of SendGrid’s Parse API Webhook, I hope you’d start thinking about how you could extend it further using other interesting Mashape APIs like image recognition (parse attachments), summarization, and others. Also, with Firebase we have given ourselves a bit more leeway in terms of expanding into more structured data. This was also built WITHOUT security in mind so I’d appreciate any Github pull requests out there ;) Send me some suggestions and comments! chris [at] mashape.com


Have thoughts on this post?
Chat with us about it on Twitter and Google+