SendGrid officially supports 23 open source projects, so how should we prioritize which projects we support? This blog post demonstrates how you can automate a portion of that calculation using one of those open source projects, the Open Source Library Data Collector.

To determine how we can best use our resources towards supporting the communities that depend on our open source projects, we need to understand the impact of these various projects. Yes, the infamous return on investment (ROI) calculation.

The two types of external data sources we will explore, that are useful in prioritization calculations, come from GitHub and various language-specific package managers.

Why Open Source?

It would be easier for us to keep this tool internal, so why did we take the time to open source it? At SendGrid, we are extremely grateful to the various open source communities whose technologies help us to serve our customers. This is a way for us to give back by paying it forward.

After completing this project, we found that it saved us a lot of time while providing valuable data that we use for helping us prioritize and to calculate ROI metrics. We thought it would be great to give that gift to others.

As a quick aside, we did something similar with the 7 HTTP clients:

What Does this Project Do Specifically?

This project allows you to automatically collect and store data in a database from from GitHub and other various package managers. Check out the data schema for details.

Let’s Get Technical

You can find the repo for this project here.

This project was developed in Python and runs on Heroku once per day, posting to a ClearDB MySQL database.

Following is a description of the various modules that comprise this project.

app.py

This is the entry point of the application, run once per day on Heroku. Here is where you would adjust the code for your specific workflow. It relies on the modules described below.

config.py

Here we process the environment variables and the application variables for use throughout the application.

The environment variables (.env) are used for authentication to GitHub, SendGrid, and your MySQL database.

The application variables (config.yml) allow you to set what GitHub repos and package managers you want to monitor. Your email settings are configured here as well.

For local testing, we use Tox so that we can test various versions of Python before uploading to GitHub. This allows us to catch compatibility issues before Travis CI has a go at it.

db_connector.py

This module provides an easy interface to your database and allows you to maintain both local and Heroku (cleardb) hosted databases. The heavy lifting is done through SQLAlchemy.

The database adaptor auto generates its model based on this database schema.

github.py

In this module we gather the data from the GitHub repos specified in the configuration, add that data to our DB, and then return the data. It utilizes the excellent github3.py library.

package_managers.py

Here is where things can get ugly. Most of the package managers either don’t have an API, or I deemed it faster to simply scrape the data from the web page. Screen scraping can be tedious, but BeautifulSoup makes it almost fun.

In this module, where possible, we scrape the downloaded data from the various package managers. As of this writing, we successfully scrape nuget.org, npmjs.com, packagist.org and rubygems.org. PyPi has stopped displaying download data in the time since we originally launched this project.

This module needs refactoring to make the processing of the package manager URLs configurable. Currently, you need to modify this module manually.

sendgrid_email.py

This module sends an email through SendGrid to alert you on a successful DB update.

Following is a sample of what the GitHub data looks like:

Interpreting the Data

Check out this blog post for some ideas on how you can utilize the data collected by this software.

Future/Roadmap of This Project

If you would like to contribute to this project, please take a look at the open issues and our contribution guidelines.

Types of contributions we need:

  • Feature requests
  • Bug reports
  • Code improvements
  • Feature implementations

Future/Roadmap of SendGrid & Open Source

We are committed to serving the open source community and the ecosystem that supports it. Keep an eye on our GitHub page to see what we are up to, contribute on GitHub with issues/pull requests and subscribe to this blog for the latest news.

References



Elmer Thomas is SendGrid's Developer Experience Engineer. His mission is to help SendGrid live up to its slogan: "Email Delivery. Simplified" by improving the lives of developers, both internally and externally. Via all sorts of hackery, of course. Follow his exploits on Twitter and GitHub.