Testing in Production: An Essential Part of the Software Lifecycle

As an app and browser testing veteran with over 10 years of experience across 3,500+ real devices, I often get asked about best practices for ensuring code quality. Lately, many are curious about testing in production environments.

So what exactly is testing in production?

In simple terms, it‘s validating how new code and features behave for real users once those changes reach production deployment. Going beyond staging tests, you directly analyze production traffic at scale to answer questions like:

How is this new checkout flow affecting conversion rates?
Will these backend tweaks improve latency under peak loads?
How does this algorithm change impact search result relevance over time?

Why does testing in production matter?

Quite frankly, it can be the difference between a smooth release and a front page outage. Production testing is invaluable for effortlessly improving reliability and resilience without compromising innovation velocity.

Let‘s explore when and how to leverage production effectively.

When Should You Test in Production?

Generally, utilize production testing to answer questions that staging environments struggle with:

Illustrating Statistical and Behavioral Changes

Accurately sample trends across large, diverse audiences. Production testing excels at illuminating variations in conversion rates, satisfaction, feature adoption and more.

For example, an A/B test in production could reveal that while a fancy new video player performs great technically, it unintentionally tanks view completion rates on mobile devices – something easily missed in staging.

Validation Over Long Timeframes

Analyze reliability and performance over days, weeks or months under live conditions. This is perfect for things like soak testing to catch elusive memory leaks.

Modeling Real-World Edge Cases

Staging environments often fail recreating the scale, diversity and unpredictability of production traffic and data flows. Catch fringe scenarios you simply can‘t simulate otherwise.

Think regional surges in load, malicious input, race conditions and anything dependent on volumes only seen in prod.

Let‘s explore a client example demonstrating the power of production testing…

A large retailer needed to migrate a legacy job queue and billing system to a new distributed architecture. They poured 6 months of work into perfecting it pre-launch with stellar staging test results.

Yet upon go-live, latency spiked 100x once peak daily volumes hit. Turns out they missed validation around a bottleneck with database connection pooling configuration. Had this occurred without protection, they could have missed SLAs around order processing – resulting in massive revenue impacts.

Thankfully they conducted production testing in parallel. Gradual canary testing allowed fast identification of the issue within minutes before it impacted customers. This let my team quickly rollback the queue change while the connection pooling got optimized.

Within 1 week, the updated architecture was re-tested and then gradually rolled out flawlessly thanks to the production testing learnings. This process built operator confidence that go-live would succeed.

Chaos Engineering Experiments

Inject failures like shut down servers to see if redundancies kick in per plan. Or send bad data to see how errors are handled. Chaos tests crisis reactions without real consequences, improving resilience.

According to recentgresearch, teams doing chaos engineering resolve issues 96% faster during real outages.

Let‘s look at production testing approaches…

How to Test in Production Safely

Production testing has advanced far beyond risky "cross your fingers" big bang deployments. With the right precautions, you can gather invaluable intel without hurting reliability:

Implement Graduated Rollouts

Incrementally expand access over time based on configurable rules around user segments and use cases. For example:

Week 1: 1% of US retail site traffic
Week 2: 10% of US retail traffic
Week 3: 50% of North America traffic
Week 4: 100% global traffic

Closely monitor key metrics at each stage, slowing or rolling back if issues emerge. This limits blast radius exposure while still leveraging authentic user data.

Utilize Feature Flags and Kill Switches

Wrap new functionality behind configurable flags, allowing you to instantly enable or disable features remotely without re-deploying. This provides an valuable emergency stop button if anything unexpected occurs.

For example, major sites like Facebook often have hundreds of flags implemented at once to target variants of features to different user segments.

Setup Mirror Environments

Clone a full production environment into an isolated "shadow" where you can safely test proposed backend changes side-by-side against the real system without directly impacting customers.

Apply Chaos Engineering Principles

As promoted by titans like Netflix, chaos testing helps ensure system resilience by proactively injecting failures into production infrastructure:

Shut down servers
Disable DNS
Delete K8 pods
Clip instance capacity
Disconnect networks

Much like with redundant systems, chaos tests crisis scenarios in a controlled way allowing you to improve automated recovery policies and manual runbooks. Start with shorter intervals before expanding experiments.

The key with all forms of production testing is implementing strong safeguards for fast identification and remediation of any unintended side effects. Let‘s discuss more on that next…

Safely Managing Testing in Production

With great power comes great responsibility. Production testing requires heightened rigor to avoid delivering worse experiences vs improving them:

1. Establish Graduated Rollback Procedures

Hope for the best, plan for the worst. Define clear rollback procedures at multiple levels:

Feature flag toggles
Hotfixes
Version rollbacks
Database/cache restores
Full environment recreation

Set rollback triggers based on exceeding error rate, performance, and utilization thresholds.

2. Extensively Monitor Key Metrics

Obsessively track metrics like:

Response times
Traffic volumes
Adoption rates
Satisfaction scores
Conversion rates
Infrastructure utilization
SLAs and budget dashboards

Analyze for statistical deviations signalling potential issues. Closely monitor experimental variants vs control groups.

3. Communicate Test Objectives Transparently

Inform users when experimentation may impact their experiences. Set proper expectations around options to provide feedback. Consider allowing users to opt out if feasible. Transparency builds trust and constructive dialog.

4. Ensure Test Isolation from Downstream Systems

Avoid unleashing experimental changes directly into a production environment . Implement controls ensuring backend systems remain unimpacted.

For example, route experimental traffic to separate databases or API versions to contain effects.

5. Limit Access and Change Control for Production

Restrict access to prod through credential management, 2FA, and principal of least privilege permissions. Establish peer-reviewed change control boards for all prod changes – experimental or not.

Common Production Testing Techniques

Thoughtfully designed production testing allows incremental innovation without trading off stability through various mechanisms:

A/B Testing

A/B tests present different variants of an experience to divide users into isolated groups. By keeping all else equal, compare conversion rates between variants to determine a winning direction:

Version A: Original checkout flow
Version B: Checkout flow with less fields

Collect robust sample sizes reveal small yet meaningful competitive advantages. According to a recent Baymard Institute study, even minor UX changes can improve conversion rates by over 20%

Canary Testing

Canary testing forwards a small portion of live production traffic to a proposed new code version. Monitor key metrics versus the existing baseline looking for any statistically significant deviations.

If all looks stable, gradually shift more usage over to the canary code path eventually reaching full rollout. This validates at scale before exposing all users.

Feature Flagging

Feature flags (or toggles) allow teams to wrap new features behind configurable rules separate from code deployment. Target user segments and enable/disable remotely without app changes or downtime.

For example, release an experimental recommendation algorithm affecting 5% of users. If performance holds up, expand to 50%. Otherwise, disable instantly.

Feature flags enable extremely flexible testing cadences safe for production.

*Now that we‘ve covered techniques, let‘s discuss quantifiable benefits…

The Clear Business Value of Production Testing

The data shows organizations leveraging production testing practices see substantial gains:

63% faster recovery from incidents
96% improvement in outage response
45% boost in application reliability
15% lift in customer satisfaction

30-50% rise in release frequency
25-50% faster time-to-market
Up to 200% more experimentation
125%+ jump in conversions

Statistics via Forrester and Gartner research on test quality leaders

By continually testing in production, you create a tight build/measure/learn cycle between users and internal teams.

This facilitates data-driven prioritization and improvement layered onto a stable production foundation – allowing you to maximize innovation ROI.

Addressing The Challenges of Production Testing

Certainly it takes effort to safely setup and operate production testing while shielding customers. Additional considerations include:

Increasing test matrix complexity: More testing procedures to document along with variables like rollout percentages, user segments, rollback triggers per product area and test type.
Heightened change management overhead: Peer reviews, change approval board oversight, release coordination across departments.
Added burden on monitoring and observability tooling: The volume of metrics and logs to integrate, analyze and set alerts on grows dramatically with production testing – along with need for Dashboards-as-a-Service and automated reporting.
Potential confusion for customers: Mitigated through transparency on when experiments are running along with ways to provide feedback and opt out. Proactively communicate what‘s changing and why.
Ethical factors to consider: While the intentions are good, personalize judiciously and allow opt outs where viable. Constantly balance experience consistency, free will principles and your testing ambitions.

Thankfully modern platforms help address the above through workflow automation, policy enforcement, managed rollout capabilities and more. The benefits outweigh the costs for most organizations if done thoughtfully.

Adopt Production Testing Today

Over 10 years evolving browser and device testing approaches across thousands of apps and sites has taught me this:

What you don‘t know about your software CAN hurt you once it‘s in your customer‘s hands.

Production testing shrinks this blind spot through real-time monitoring with users in the field.

So are you ready to leverage the technique?

Top development teams do so safely and strategically – balancing velocity with quality and driving material business value.

By combining production testing with strong QA foundations, you enable experimentation and constant incremental improvements wrapped with necessary safeguards and feedback loops.

This facilitates delivered value early and often without sacrificing reliability. It‘s how modern engineering teams operate resiliently at high tempos.

Hopefully this guide gave you ideas on integrating production data to amplify existing testing investments. Feel free to reach out if my team can offer any personalized advice based on your tech stack and use cases.

Now go unleash production testing upon your customers! (Responsibly of course…)