If you are supporting and maintaining public facing code, gathering data regarding the usage of that code can help with planning future improvements by highlighting priorities, and reporting the return on investment (ROI) of your projects. You will find the software described in this post especially useful if your code is hosted on GitHub and you use package managers for distribution. It will allow you to gather valuable and informative data that can help drive future improvements, and save you time.
Using the Data
At SendGrid, we maintain open source libraries across 7 programming languages. Using the data collected from this tool combined with internal metrics (such as the number of API calls through a given library) allows us to make business decisions based on real data. For example, you may want to divvy up the points in your sprints based on the usage patterns of your libraries.
From GitHub, we retrieve and store the number of pull requests, issues, commits, branches, releases, contributors, watchers, stargazers, and forks. We also gather the number of library downloads from NuGet, npm, Packagist, PyPI, and RubyGems. At SendGrid, we also include user agents in our API calls for further understanding of our API usage.
Here is a quick breakdown on how these various data points help us make decisions:
- pull requests – tell us how active the community is and how they are participating at a deep level
- issues – tell us how healthy the code is and are a measure of community engagement
- commits – tell us how active the code base has been and provides some measure of stability
- branches – give us a quick gauge of all current, active development on the code base
- releases – show us how often we are iterating and provides a data point for stability
- contributors – tell us how active our community is and who is contributing
- watchers – tell us who is monitoring all our changes
- stargazers – is mostly a vanity metric, but is a measure of community engagement
- forks – measures community engagement
- library downloads – help us understand usage in combination with internal metrics
The code is written in Python and supports version 3.2 through 3.5. The data is time-stamped and stored in a MySQL database. We provide instructions on how to install the data collector locally or on Heroku, and suggest that you run the software at least once a month.
We have also included the ability to automatically send an email, through SendGrid, at the end of the program’s execution, to alert your team of a successful data collection.
For most use cases, you will only need to modify the config.yml and .env files. However, it is not difficult to modify the code base and we have documented several items on our wish list that we would love to see you implement. For example, we need unit tests to support our integration tests and an integration with Keen.io would be useful. Please see our contributing guide for details. If you have any items to add, please feel free to open an issue.