Testing in production used to have a terrible reputation. And some (or most?) of it was probably deserved. But everything changes, and the software industry is probably the fastest-changing “thing” ever. Nowadays, testing in production is not only tolerated but actively encouraged in many situations. However, a bad reputation isn’t an easy thing to shake off. Many people are still—understandably—skeptical about the whole thing. Are you one of those? Then today’s post is for you.
We’ll start by defining testing in production and explaining the reasons why most people thought so poorly of it. Then, we follow that with a recipe for badly done testing in production. Finally, we discuss how to do it the right way, covering the reasons why testing is production is an essential component of a modern quality strategy. Let’s go.
What Is Testing in Production?
Before you go any further in the article, it’s essential we’re on the same page in regard to what “testing in production” means. That’s why we’re going to start by defining the term. So, what is testing in production?
Well, let’s start by clearing up a common misconception. Testing in production does not mean deploying untested features in order to secure time-to-market, hoping that everything will turn out ok when the customer tries to use them. As they say, hope isn’t a strategy. A proper QA strategy still has to make sure testing is moved to as early in the process as possible and take full advantage of unit testing and other automated testing techniques. That is to say, testing in production isn’t supposed to replace proper testing done before production but complement it.
Testing in production, rather, refers to the continuous testing of the application in the production environment, after a deployment. By doing so, you can leverage unique benefits that will benefit the application’s quality (more on that later.)
Poor Testing in Production: Here Comes Our Recipe
Let’s take a brief detour to cover the reasons why testing in production has carried such a stigma. Basically, it stems from differences in vocabulary. Many people, upon hearing “testing in production” don’t think of something remotely close to what we’ve described in our definition. Instead, they think of the unprofessional process of deploying untested (or poorly tested) code and crossing fingers hoping for the best. They relate testing in production with a lack of proper software engineering best practices and the sheer inexistence of automated testing of any kind.
And to be honest, not all of the said bad reputation is undeserved. If you do it haphazardly, testing in production might put you in much trouble. Loss of data, financial loss, and tainted reputation are some of the consequences you might bring on yourself, to name a few. And of course, in this post-GDPR era we live in you risk catastrophic legal consequences as well. Additional risks might involve:
- High error rates setting off alerts and waking up people on call.
- Incorrect revenue recognition of generating revenue events (e.g., canceled orders.)
- Unintended consequences on other production systems.
- Noise in logs due to script and bot activity.
If you’re to avoid the bad testing in production, you’d better learn about it. So, what are the ingredients of poorly done testing in production? Here is a small list:
- Lack of testing before production.
- No easy way to rollback faulty deployments.
- Lack of proper backup strategy (which includes practicing backup restoration.)
- Performing production tests at inappropriate times
Testing in Production Done Right: The Why and How
We’ve started the post by saying that testing in production, when done right, can net you unique benefits. Now it’s time to talk about said benefits. Why should you do production testing?
Why Do It?
Sometimes you don’t have a choice. You might be in a scenario where designing a stage testing environment is impracticable or unaffordable. Also, often you need to gather real usage data, so you have no other route than turning to the real thing.
Some forms of testing in production yield better results when performed in the production environment. If you want to verify the scalability of your app, then load testing in production is what you need.
At the end of the day, it all comes down to how complex a thing software is. No matter how good your QA strategy is, how well you employ the best practices and state-of-the-art tools, some bugs will inevitably end up in production. Testing/monitoring your application in production is your last line of defense against such bugs.
Since you’re testing with production data, you might be able to detect problems in scenarios hard to replicate in test environments.
It can help you create a disaster recovery process making your application more resilient against expected or unexpected failures.
It allows you to design beta programs enabling users to provide early, valuable feedback.
Testing in the production environment, when performed daily, reduces the risk in deployments when you monitor your application in real-time.
How to Do It?
How to perform testing in production the “right way?” We’ll now answer this question by covering some of the main techniques you can use to leverage the power of testing in production.
A/B testing means a type of statistical experiment. You split a user base into two groups, A and B. You then give group A the most recent version of your app, called the control. The second group gets a modified version of the app, which we call the treatment or variation.
You can then compare how users in both groups behave. Analyzing the data you gather, you can conclude whether the changes in the treatment are worth keeping or not, and make an informed decision on what to do next.
Canary releases, at first sight, might look a lot like A/B releases. Here’s how Danilo Sato defines them:
Canary release is a technique to reduce the risk of introducing a new software version in production by slowly rolling out the change to a small subset of users before rolling it out to the entire infrastructure and making it available to everybody.
So, basically, you release the new version to a subset of your users and closely monitor it, rolling it back if things go south. How does this differ from A/B Testing?
They differ in intent. A/B testing is supposed to gauge the interest of users in a potential new feature. Canary releases, on the other hand, are a risk mitigation tool.
Just for the sake of completion, we’ve included load testing, which we’ve already covered in passing.
When organizations want to use techniques such as A/B testing, how do they switch a given feature on and off? The answer is feature flagging.
Feature flags (also called feature toggles) are conditional toggles you can use to determine if a given feature should be exposed to the user or not. At its most basic form, a feature flag is nothing more than an if statement. A proper feature flagging management strategy would have a way of controlling the value of the toggles from outside the application.
Application monitoring is yet another category of activities that might be considered testing in production, even if it doesn’t sound like that at first. This activity is exactly what its name suggests: the process of actively—and in an automated manner—monitoring applications in production, so you can ensure that not only the app continues to behave as expected when it goes live, but also that they key people are notified with the right set of alerts if something does go wrong. Application monitoring can be categorized into two main groups: real user monitoring (RUM) and synthetic monitoring—aka synthetic testing.
Real user monitoring, as the name suggests, is the process of monitoring, in real-time, actual humans interacting with the application. With RUM you can see how the application handles real requests as they come in. Synthetic monitoring/testing, on the other hand, refers to monitoring how the application reacts to requests coming from simulated—or synthetic, hence the name—users. Real user monitoring and synthetic monitoring are both valuable. Each approach has its share of strengths and weaknesses, and each is better suited for different scenarios; both of them are considered testing in production.
Another important form of monitoring is tracing. Tracing a single transaction as it goes through the different layers of an application might also be considered testing in production. Closely following a transaction as it travels through the application allows you to have a glimpse at the parts of the codebase that are being exercised. That way, it’s possible to identify defects, poor performance, or other issues.
Production User Acceptance Testing
Many organizations adopt, as part of their SDLC, a heavy-weight round of user-acceptance testing that has to happen before any major feature is signed-off for release. In doing so, they delay the arrival of their code to production, where it can be used by real end-users.
Due to the increased adoption of techniques such as feature flagging, organizations today are able to use a different tactic: they forgo the comprehensive and lengthy acceptance-testing process that happens before production, trading that by verifications that happen in production. By hiding a feature behind a flag, it becomes trivial to allow access to it only to authorized personnel. After the testers give their thumbs-up, the flag can be switched off, allowing all of the users to access the functionality.
More Types of Testing In Production
Depending on your definition of testing, there’s a long list of activities and techniques related to software quality that can be considered “testing in production.” What follows is a non-exhaustive list of some of such activities:
- Chaos Engineering
- Automatic broken link checking
- Visual regression testing
- Disaster recovering testing
- Accessibility testing
Testing in Production: Aye or Nay?
Everything changes amazingly fast when it comes to the software industry. Sometimes what seemed unthinkable a few years ago becomes commonplace. Today’s villain might be tomorrow’s hero, and that’s exactly what happened with testing in production.
Due to the widespread use of software engineering and QA best practices, we can now afford to test in production in safe ways, which enables us to reap benefits we wouldn’t be able to get otherwise.
Also, new developments never stop being created. As we’ve said, again and again, our industry is an ever-changing one. Recent developments in the testing scenario include the use of AI-powered tools that help teams overcome the starkest quality challenges they face.
So, testing in production: yes or no? For us, it’s a clear yes, and I hope we have succeeded in convincing you so. Thanks for reading, and until next time.
This post was written by Carlos Schults. Carlos is a .NET software developer with experience in both desktop and web development, and he’s now trying his hand at mobile. He has a passion for writing clean and concise code, and he’s interested in practices that help you improve app health, such as code review, automated testing, and continuous build.