4 March 2025

08 Min. Read

How can engineering teams identify and fix flaky tests?

We recently worked with a bunch of beta partners at Trunk to tackle this problem, too. When we were building some CI + Merge Queue tooling, I think CI instability/headaches that we saw all traced themselves back to flaky tests in one way or another.

Basically, tests were flaky because:

The test code is buggy
The infrastructure code is buggy
The production code is buggy.

➡️ Problem 1 is trivial to fix, and most teams that end up beta-ing our tool end up fixing the common problems with bad await logic, improper cleanup between tests, etc.

➡️ But problems caused by 2 makes it impossible for most product engineers to fix flaky tests alone and problem 3 makes it a terrible idea to ignore flaky tests.

That’s one among many incidents shared on social forums like reddit, quora etc. Flaky tests can be caused due to a number of reasons, and you may not be able to reproduce the actual failure locally.

Because its expensive, right!

It becomes really important that your team actually spends the time to identify tests which are actually flaking frequently and focuses on fixing them vs just trying to fix every flaky test event which ever occurred.

Before we move ahead, let’s get some fundamentals clear and then discuss the unique solution we’ve that can fix your flaky tests for real.

The Impact on Business

A flaky test refers to testing that generates inconsistent results, failing or passing unpredictably, without any modifications to the code under testing. Unlike reliable tests, which yield the same results consistently, flaky tests create uncertainty.

Flaky tests cost the average engineering organization over $4.3M annually in lost productivity and delayed releases.

Impact Area	Key Metrics	Industry Average	High-Performing Teams
Developer Productivity	Weekly hours spent investigating false failures	6.5 hours/engineer	<2 hours/engineer
CI/CD Pipeline	Pipeline reliability percentage	62%	>90%
Release Frequency	Deployment cadence	Every 2-3 weeks	Daily/on-demand
Engineering Morale	Team satisfaction with test process (survey)	53%	>85%

Causes of Flaky Tests, especially the backend ones:

Flaky tests are a nuisance because they fail intermittently and unpredictably, often under different circumstances or environments. The inability to rely on consistent test outcomes can mask real issues, leading to bugs slipping into production.

Concurrency Issues: These occur when tests are not thread-safe, which is common in environments where tests interact with shared resources like databases or when they modify shared state in memory.
Time Dependency: Tests that fail because they assume specific execution speed or rely on timing intervals (e.g., sleep calls) to coordinate between threads or network calls.
External Dependencies: Relying on third-party services or systems that may have varying availability, or differing responses can introduce unpredictability into test results.
Resource Leaks: Unreleased file handles or network connections from one test can affect subsequent tests.
Database State: Flakiness arises if tests do not reset the database state completely, leading to different outcomes depending on the order in which tests are run.

Strategies for Identifying Flaky Tests

1️⃣ Automated Test Quarantine: Implement an automated system to detect flaky tests. Any test that fails intermittently should automatically be moved to a quarantine suite and run independently from the main test suite.

# Example of a Python function to detect flaky tests
def quarantine_flaky_tests(test_suite, flaky_threshold=0.1):
    results = run_tests(test_suite)
    for test, success_rate in results.items():
        if success_rate < (1 - flaky_threshold):
            quarantine_suite.add_test(test)

2️⃣ Logging and Monitoring: Enhance logging within tests to capture detailed information about the test environment and execution context. This data can be crucial for diagnosing flaky tests.

Data	Description
Timestamp	When the test was run
Environment	Details about the test environment
Test Outcome	Pass/Fail
Error Logs	Stack trace and error messages

Debug complex flows without digging into logs: Get full context on every test run. See inputs, outputs, and every step in between. Track async flows, ORM queries, and external calls with deep visibility. With end-to-end traces, you debug issues with complete context before they happen in production.

3️⃣ Consistent Environment: Use Docker or another container technology to standardize the testing environment. This consistency helps minimize the "works on my machine" syndrome.

Eliminating the Flakiness

Before attempting fixes, implement comprehensive monitoring:

✅ Isolate and Reproduce: Once identified, attempt to isolate and reproduce the flaky behavior in a controlled environment. This might involve running the test repeatedly or under varying conditions to understand what triggers the flakiness.

✅ Remove External Dependencies: Where possible, mock or stub out external services to reduce unpredictability.

Invest in mocks that work, it automatically mocks every dependency and are built from actual user flows and even gets auto updated as dependencies change their behavior. More about the approach here

✅ Refactor Tests: Avoid tests that rely on real time or shared state. Ensure each test is self-contained and deterministic.

The HyperTest Advantage for Backend Tests

This is where HyperTest transforms the equation. Unlike traditional approaches that merely identify flaky tests, HyperTest provides a comprehensive solution for backend test stability:

Real API Traffic Recording: Capturing real interactions to ensure test scenarios closely mimic actual use cases, thus reducing discrepancies that can cause flakiness.

Controlled Test Environments: By replaying and mocking external dependencies during testing, HyperTest ensures consistent environments, avoiding failures due to external variability.
Integrated System Testing: Flakiness is often exposed when systems integrate. HyperTest’s holistic approach tests these interactions, catching issues that may not appear in isolation.
Detailed Debugging Traces: Provides granular insights into each step of a test, allowing quicker identification and resolution of the root causes of flakiness.
Proactive Flakiness Prevention: HyperTest maps service dependencies and alerts teams about potential downstream impacts, preventing flaky tests before they occur.
Enhanced Coverage Insight: Offers metrics on tested code areas and highlights parts lacking coverage, encouraging targeted testing that reduces gaps where flakiness could hide.

Shopify's Journey to 99.7% Test Reliability

Key Strategies:

Introduced quarantine workflow
Built custom flakiness detector
Implemented "Fix Flaky Fridays"
Developed targeted libraries for common issues

Results:

Reduced flaky tests from 15% to 0.3%
Cut developer interruptions by 82%
Increased deployment frequency from 50/week to 200+/week

Conclusion: The Competitive Advantage of Test Reliability

Engineering teams that master test reliability gain a significant competitive advantage:

30-40% faster time-to-market for new features
15-20% higher engineer satisfaction scores
50-60% reduction in production incidents

Test flakiness isn't just a technical debt issue—it's a strategic imperative that impacts your entire business. By applying this framework, engineering leaders can transform test suites from liability to asset.

Want to discuss your team's specific flakiness challenges? Schedule a consultation →

Related to Integration Testing

Frequently Asked Questions

1. What causes flaky tests in software testing?

Flaky tests often stem from race conditions, async operations, test dependencies, or environment inconsistencies.

2. How can engineering teams identify flaky tests?

Teams can use test reruns, failure pattern analysis, logging, and dedicated test analytics tools to detect flakiness.

3. What strategies help in fixing flaky tests?

Stabilizing test environments, removing dependencies, using waits properly, and running tests in isolation can help resolve flaky tests.

For your next read

Dive deeper with these related posts!

07 Min. Read

Choosing the right monitoring tools: Guide for Tech Teams

Learn More

09 Min. Read

RabbitMQ vs. Kafka: When to use what and why?

Learn More

09 Min. Read

CI/CD tools showdown: Is Jenkins still the best choice?

Learn More

Watch a Product Demo

Tech Verse

4 March 2025

08 Min. Read

How can engineering teams identify and fix flaky tests?

The Impact on Business

Causes of Flaky Tests, especially the backend ones:

Strategies for Identifying Flaky Tests

Eliminating the Flakiness

The HyperTest Advantage for Backend Tests

Shopify's Journey to 99.7% Test Reliability

Conclusion: The Competitive Advantage of Test Reliability

Frequently Asked Questions

1. What causes flaky tests in software testing?

2. How can engineering teams identify flaky tests?

3. What strategies help in fixing flaky tests?

For your next read

07 Min. Read

Choosing the right monitoring tools: Guide for Tech Teams

09 Min. Read

RabbitMQ vs. Kafka: When to use what and why?

09 Min. Read

CI/CD tools showdown: Is Jenkins still the best choice?