Lessons Learned Overhauling Client Libraries Across 7 Languages
by Justin Hammond
When integrating with an API, developers typically gravitate toward two key resources to get them started: documentation, so they know how to use the API, and client libraries (or SDKs) so they can take advantage of pre-written code instead of reinventing the wheel. At EasyPost, a staggering 81% of API requests are piped through one of our open-source client libraries. We support seven programming languages: Python, Ruby, PHP, Node, C#, Java, and Golang; allowing developers across various tech stacks to jumpstart their integrations quickly. With so much of our user base implementing these projects in their own applications, any large-scale changes or additions must be handled with care. I'd like to share some victories, pain points, learnings, and processes we found while overhauling our seven client libraries over the last six months which included one or two major releases per library, dozens of minor releases, tens of thousands of lines of code changed, and just three engineers.
Why an overhaul was needed
All seven of our client libraries needed some TLC. Many have been around for the last ~ ten years since EasyPost started. A couple hadn't seen an update in several years and, as a result, were missing several newly-released features. As new language versions came out, there were new language syntax, styles, and best practices that we wanted to take advantage of. In addition, many of the dependencies used in these projects were outdated or unnecessary and must be thoroughly combed through. With very little test coverage and no consistent styling or linting, there were various bugs and errors strewn throughout the libraries that we wanted to clean up, while also modernizing and standardizing the libraries so that no matter your tech stack, you'd have a consistent and incredible EasyPost experience.
We decided there were a few key areas we wanted to focus on while we overhauled these libraries:
- Feature parity
- Modernization (dependencies and syntax)
- Linting & styling
- Comprehensive test coverage
- Enable faster feedback loops via CI
- Streamline releasing
- Data-driven decisions
Feature parity
We started with feature parity, as we felt this was the most impactful change we could make that wouldn't necessitate a major release. This would allow us to get as many features to our users without migrating their current integrations due to breaking changes. We took inventory of what features our API supported and tracked that parity across all of our libraries. We found that many of these projects only supported ~70% of the available features, the mix of features differing between each library. We had just released API features such as SmartRate and Tax Identifiers that all the libraries needed.
In contrast, some libraries, like C# had objects missing entirely for years, such as the Insurance object (now present). We even found some features implemented in all libraries but behaved slightly differently. Therefore, we took this opportunity to ensure that we added every API feature to every client library and that all functionality behaved the same, regardless of language. This creates not only an incredible experience for our users but also for our internal support team that assists these users. No longer do they need to wonder what is and isn't possible based on what language is in use, nor do they need to wonder about the edge cases or caveats to a feature in a particular language.
Modernization
Once we had accomplished feature parity across libraries, it was time to move on to larger breaking changes that would lead to major releases. We decided to take the "path of least destruction." We wanted to assist users in adopting newer releases of our client libraries, not hinder their progress. We often hear about an open-source project making too many large sweeping changes in a single release, making upgrading a difficult chore for large codebases. When that happens, many users find it easier to stay on the current software version rather than go through the painful migration process to get everything up-to-date. Over time, this leads to technical debt, security issues, missing features, and lower performance. To enable easy adoption of major version releases, we knew we had to keep the barrier to entry low as we modified the libraries with breaking changes.
This meant that we had to:
- Keep breaking changes to a minimum
- Save breaking changes for a time where we could release them under a major version bump
- Plan multiple, smaller, spread-out breaking changes (if necessary) so we wouldn't overwhelm users with too many changes in a single release
- Provide a comprehensive upgrade guide for each library detailing what steps would need to be taken to migrate from one major version to the next
One of the largest breaking changes we wanted to make was modernizing the programming language version matrix we supported for each language. For instance, our PHP library was still built and tested against PHP 5.3, even though the version of the language was released eight years ago. Many of our libraries could not even be built against newer versions of their respective languages; our Node library, for instance, could only be built against Node 14 and older and would throw dozens of errors on Node 16 or newer before failing.
We needed to strike a balance between supporting potential legacy integrations and versions while enabling users to upgrade and use the latest language version in their own stack. We determined the best option was to start dropping support for unsupported language versions until the oldest version, minus one (where possible). For example, we dropped support for pre-2.5 Ruby (2.6 at the time was the oldest version still supported). This allowed us to quickly modernize our version support while providing a generous grace period by returning a version for those users or integrations running an older version of the language. We could finally upgrade some long overdue dependencies to support newer language versions with a minimum now set. This opened the door to supporting the latest language versions such as Node 16, PHP 8, .NET 6, and more.
Linting and styling
With feature-complete libraries that were now modernized, we turned our sights on making them beautiful. As many of these projects had been around for nearly ten years without a style guide to follow and dozens of contributors each, you can imagine how many style and syntax differences had cropped up over time. We consulted our internal stakeholders for style guides, formatters, and syntax choices for languages we used internally. For Python, we turned to Black to format our code and flake8 for linting. For Ruby, we introduced Rubocop to the library and incorporated our internal style guide, EasyCop, to lint the project. For Node, we used ESLint and turned to our UI team for our internal style guide and linting rules. For other languages, we continued to integrate an industry-standard linter/formatter, creating our own custom style guide for C# and adapted the Sun style guide for Java. After linting all the libraries, we fixed over 10,000 style or syntax issues. With a consistent style, we were now spending less time processing and debating code styles and more time actually reading and writing code. No longer were opinions the prevalent topic of reviews; instead, we got to focus on the usage and performance of the code in question.
Comprehensive test coverage
We decided for our initial pass of each library that we would bring test coverage to 70% or higher. This is because many of these project's test suites either didn't exist or only tested a handful of our ~100 supported features. By testing every user-accessible function, we found hundreds of lines of unused or unreachable code along with dozens of bugs across all our libraries that have since been cleaned up and patched. Next, we introduced industry-standard "VCR" solutions to each test suite (e.g., vcr for Ruby, vcrpy for Python, Pollyjs for Node) to record and replay HTTP requests, so subsequent test runs do not need to make live API calls. Where there were no viable VCR solutions (such as with C# and Java), we built and open-sourced our own (easyvcr-csharp and easyvcr-java). This allowed us to run idempotent test suites across different machines or contributors. We also added an exhaustive guide to testing each library to the project's README and built a set of reusable fixtures that our tests could use as dummy data. Finally, we ensured that sensitive data was scrubbed from saved cassette files so that it was safe for anyone to contribute and test our code without fear of exposing personal info.
Continuous integration
With all of these changes in-flight across seven different projects, we needed a way to shorten the feedback loop. With linters and test suites in place, it was finally time we could enforce proper styling and unit tests on every commit. As our code already lived on GitHub, the natural answer for continuous integration was GitHub Actions. We ran our test suites, linted, and built our projects in the cloud on every commit which unlocked the ability to catch bugs faster, correct styling errors prior to review, and ensure that every contribution was held to the same high standards.
Streamline releasing
Unlike our core API which we can (and do) push changes to dozens of times a day which immediately reflect to users - we have to cut a new client library release, and have users download the update before they can take advantage of the changes we've made. We wondered how quickly we could start getting these long-overdue changes into the hands of our users. Each programming language uses a different package manager to distribute and download a package (e.g. Maven for Java, NuGet for C#, RubyGems for Ruby); as such, we had to slowly work through our dated internal releasing documentation and make corrections as we made our first big release for each library. Once we had a few manual releases under our belt, we started automating the process by introducing helpers such as Makefiles or by using the languages built-in package equivalent (eg: composer.json or package.json commands) to cut down on the repetitive releasing commands. Many of these processes required running a half-dozen commands or more by hand to release a new version of a library; now, most only require one or two (eg: `make clean install build publish`). In the process, we also found that some of these projects had their built or compiled assets unnecessarily checked into their Git repositories. We removed these and used .gitignore entries to ensure that every build is clean and gets only the latest assets prior to packaging.
Data-Driven decisions
Prior to starting these overhauls, we had a lack of data to make educated decisions. For example, some of our libraries did not include the programming language version number in the User-Agent header during API calls. We've since added this data point (and plan to backport it to older versions) so we can better determine language version usage and when to deprecate support for a particular version. We are also building internal reporting and analytics to better assist our team with usage across libraries, endpoints, and features. We also plan to incorporate timing metrics to assist in performance improvements to the libraries along with additional customer outreach to gather as much information as possible on how we can best improve the developer experience.
Lessons learned
We learned a lot over the last few months and have made incredible progress towards providing world-class client libraries for shipping developers to take advantage of. To recap some of our findings:
- Feature parity ensures a consistent experience regardless of tech stack. This also assists internal stakeholders who may need to support customers
- Modernization allows supporting the most recent versions of your stack while also combating security issues
- Linting and formatting makes your code beautiful which in turn keeps it maintainable. You will spend less time debating code style opinions and more time actually writing code
- Comprehensive test coverage protects against regressions when future changes are made, boosts confidence in the product, and quickly exposes bugs and inconsistencies
- Enforcing all of the above via Continuous Integration shortens the feedback loop which helps teams move faster and get more done
- Streamlining your releasing process gets your changes to your users faster and reduces human error
- Data leads to educated decisions. If you don't have the data you need to make a decision, you are going in blind
We are excited to see what you might build with one of our client libraries. We hope the various improvements we've made in the last year will assist you on your journey and look forward to the community's continued involvement. We welcome your feedback via GitHub issues or pull requests on our GitHub page and can't wait to see you shipping. If you'd like to build with us, we are always hiring! Come see what changing the shipping world is all about.