Feature toggles: just roll your own!

When you’re dealing with a particularly large service with a slow deployment pipeline (15-30 minutes), and a rollback delay of up to 10 minutes, you’re going to need feature toggles (some also call them feature flags) to turn those half-an-hour nerve-wrecking major incidents into a small whoopsie-daisy that you can fix in a few seconds.

Make a change, gate it behind a feature toggle, release, enable the feature toggle and monitor the impact. If there is an issue, you can immediately roll it back with one HTTP request (or database query ¹). If everything looks good, you can remove the usage of the feature toggle from your code and move on with other work.

Need to roll out the new feature gradually? Implement the feature toggle as a percentage and increase it as you go.

It’s really that simple, and you don’t have to pay 500 USD a month to get similar functionality from a service provider and make critical paths in your application depend on them.² As my teammate once said, our service is perfectly capable of breaking down on its own.

All you really need is one database table containing the keys and values for the feature toggles, and two HTTP endpoints, one to GET the current value of the feature toggle, and one to POST a new value for an existing one. New feature toggles will be introduced using tools like Flyway or Liquibase, and the same method can be used for also deleting them later on. You can also add convenience columns containing timestamps, such as created and modified, to track when these were introduced and when the last change was.

However, there are a few considerations to take into account when setting up such a system.

Feature toggles implemented as database table rows can work fantastically, but you should also monitor how often these get used. If you implement a feature toggle on a hot path in your service, then you can easily generate thousands of queries per second. A properly set up feature toggles system can sustain it without any issues on any competent database engine, but you should still try to monitor the impact and remove unused feature toggles as soon as possible.

For hot code paths (1000+ requests/second) you might be better off implementing feature toggles as application properties. There’s no call to the database and reading a static property is darn fast, but you lose out on the ability to update it while the application is running.

Alternatively, you can rely on the same database-based feature toggles system and keep a cached copy in-memory, while also refreshing it from time to time. Toggling won’t be as responsive as it will depend on the cache expiry time, but the reduced load on the database is often worth it.

If your service receives contributions from multiple teams, or you have very anxious product managers that fill your backlog faster than you can say “story points”, then it’s a good idea to also introduce expiration dates for your feature toggles, with ample warning time to properly remove them. Using this method, you can make sure that old feature toggles get properly removed as there is no better prioritization reason than a looming major incident. You don’t want them to stick around for years on end, that’s just wasteful and clutters up your codebase.

If your feature toggling needs are a bit more complicated, then you may need to invest more time in your DIY solution, or you can use one of the SaaS options if you really want to, just account for the added expense and reliance on yet another third party service.

At work, I help manage a business-critical monolith that handles thousands of requests per second during peak hours, and the simple approach has served us very well. All it took was one motivated developer and about a day to implement, document and communicate the solution to our stakeholders.

Skip the latter two steps, and you can be done within two hours, tops.

letting inexperienced developers touch the production database is a fantastic way to take down your service, and a very expensive way to learn about database locks. ↩︎
I hate to refer to specific Hacker News comments like this, but there’s just something about paying 6000 USD a year for such a service that I just can’t understand. Has the Silicon Valley mindset gone too far? Or are US-based developers just way too expensive, resulting in these types of services sounding reasonable? You can hire a senior developer in Estonia for that amount of money for 2-3 weeks (including all taxes), and they can pop in and implement a feature toggles system in a few hours at most. The response comment with the status page link that’s highlighting multiple outages for LaunchDarkly is the cherry on top. ↩︎