top of page
HyperTest_edited.png
4 March 2025
08 Min. Read

How can engineering teams identify and fix flaky tests?

Lihaoyi shares on Reddit:

We recently worked with a bunch of beta partners at Trunk to tackle this problem, too. When we were building some CI + Merge Queue tooling, I think CI instability/headaches that we saw all traced themselves back to flaky tests in one way or another.

Basically, tests were flaky because:


  1. The test code is buggy

  2. The infrastructure code is buggy

  3. The production code is buggy.


➡️ Problem 1 is trivial to fix, and most teams that end up beta-ing our tool end up fixing the common problems with bad await logic, improper cleanup between tests, etc.


➡️ But problems caused by 2 makes it impossible for most product engineers to fix flaky tests alone and problem 3 makes it a terrible idea to ignore flaky tests.



Reduce Flaky Tests with HyperTest


 

That’s one among many incidents shared on social forums like reddit, quora etc. Flaky tests can be caused due to a number of reasons, and you may not be able to reproduce the actual failure locally.


Because its expensive, right!


It becomes really important that your team actually spends the time to identify tests which are actually flaking frequently and focuses on fixing them vs just trying to fix every flaky test event which ever occurred.


Before we move ahead, let’s get some fundamentals clear and then discuss the unique solution we’ve that can fix your flaky tests for real.


 

The Impact on Business


A flaky test refers to testing that generates inconsistent results, failing or passing unpredictably, without any modifications to the code under testing. Unlike reliable tests, which yield the same results consistently, flaky tests create uncertainty.


Flaky tests cost the average engineering organization over $4.3M annually in lost productivity and delayed releases.

Impact Area

Key Metrics

Industry Average

High-Performing Teams

Developer Productivity

Weekly hours spent investigating false failures

6.5 hours/engineer

<2 hours/engineer

CI/CD Pipeline

Pipeline reliability percentage

62%

>90%

Release Frequency

Deployment cadence

Every 2-3 weeks

Daily/on-demand

Engineering Morale

Team satisfaction with test process (survey)

53%

>85%


 

Causes of Flaky Tests, especially the backend ones:


Flaky tests are a nuisance because they fail intermittently and unpredictably, often under different circumstances or environments. The inability to rely on consistent test outcomes can mask real issues, leading to bugs slipping into production.


Reduce Flaky Tests with HyperTest

  1. Concurrency Issues: These occur when tests are not thread-safe, which is common in environments where tests interact with shared resources like databases or when they modify shared state in memory.


  2. Time Dependency: Tests that fail because they assume specific execution speed or rely on timing intervals (e.g., sleep calls) to coordinate between threads or network calls.


  3. External Dependencies: Relying on third-party services or systems that may have varying availability, or differing responses can introduce unpredictability into test results.


  4. Resource Leaks: Unreleased file handles or network connections from one test can affect subsequent tests.


  5. Database State: Flakiness arises if tests do not reset the database state completely, leading to different outcomes depending on the order in which tests are run.



 

Strategies for Identifying Flaky Tests


1️⃣ Automated Test Quarantine: Implement an automated system to detect flaky tests. Any test that fails intermittently should automatically be moved to a quarantine suite and run independently from the main test suite.

# Example of a Python function to detect flaky tests
def quarantine_flaky_tests(test_suite, flaky_threshold=0.1):
    results = run_tests(test_suite)
    for test, success_rate in results.items():
        if success_rate < (1 - flaky_threshold):
            quarantine_suite.add_test(test)

2️⃣ Logging and Monitoring: Enhance logging within tests to capture detailed information about the test environment and execution context. This data can be crucial for diagnosing flaky tests.

Data

Description

Timestamp

When the test was run

Environment

Details about the test environment

Test Outcome

Pass/Fail

Error Logs

Stack trace and error messages

Debug complex flows without digging into logs: Get full context on every test run. See inputs, outputs, and every step in between. Track async flows, ORM queries, and external calls with deep visibility. With end-to-end traces, you debug issues with complete context before they happen in production.



3️⃣ Consistent Environment: Use Docker or another container technology to standardize the testing environment. This consistency helps minimize the "works on my machine" syndrome.



 

Eliminating the Flakiness


Before attempting fixes, implement comprehensive monitoring:




✅ Isolate and Reproduce: Once identified, attempt to isolate and reproduce the flaky behavior in a controlled environment. This might involve running the test repeatedly or under varying conditions to understand what triggers the flakiness.


✅ Remove External Dependencies: Where possible, mock or stub out external services to reduce unpredictability.

Invest in mocks that work, it automatically mocks every dependency and are built from actual user flows and even gets auto updated as dependencies change their behavior. More about the approach here



✅ Refactor Tests: Avoid tests that rely on real time or shared state. Ensure each test is self-contained and deterministic.



 

The HyperTest Advantage for Backend Tests


This is where HyperTest transforms the equation. Unlike traditional approaches that merely identify flaky tests, HyperTest provides a comprehensive solution for backend test stability:



Reduce Flaky Tests with HyperTest


  • Real API Traffic Recording: Capturing real interactions to ensure test scenarios closely mimic actual use cases, thus reducing discrepancies that can cause flakiness.



  • Controlled Test Environments: By replaying and mocking external dependencies during testing, HyperTest ensures consistent environments, avoiding failures due to external variability.


  • Integrated System Testing: Flakiness is often exposed when systems integrate. HyperTest’s holistic approach tests these interactions, catching issues that may not appear in isolation.


  • Detailed Debugging Traces: Provides granular insights into each step of a test, allowing quicker identification and resolution of the root causes of flakiness.


  • Proactive Flakiness Prevention: HyperTest maps service dependencies and alerts teams about potential downstream impacts, preventing flaky tests before they occur.


    Reduce Flaky Tests with HyperTest


  • Enhanced Coverage Insight: Offers metrics on tested code areas and highlights parts lacking coverage, encouraging targeted testing that reduces gaps where flakiness could hide.



 

Shopify's Journey to 99.7% Test Reliability


Reduce Flaky Tests with HyperTest
 Shopify's 18-month flakiness reduction journey


Key Strategies:

  1. Introduced quarantine workflow

  2. Built custom flakiness detector

  3. Implemented "Fix Flaky Fridays"

  4. Developed targeted libraries for common issues


Results:

  • Reduced flaky tests from 15% to 0.3%

  • Cut developer interruptions by 82%

  • Increased deployment frequency from 50/week to 200+/week



 

Conclusion: The Competitive Advantage of Test Reliability


Engineering teams that master test reliability gain a significant competitive advantage:


  • 30-40% faster time-to-market for new features

  • 15-20% higher engineer satisfaction scores

  • 50-60% reduction in production incidents


Test flakiness isn't just a technical debt issue—it's a strategic imperative that impacts your entire business. By applying this framework, engineering leaders can transform test suites from liability to asset.


Want to discuss your team's specific flakiness challenges? Schedule a consultation →

Related to Integration Testing

Frequently Asked Questions

1. What causes flaky tests in software testing?

Flaky tests often stem from race conditions, async operations, test dependencies, or environment inconsistencies.

2. How can engineering teams identify flaky tests?

Teams can use test reruns, failure pattern analysis, logging, and dedicated test analytics tools to detect flakiness.

3. What strategies help in fixing flaky tests?

Stabilizing test environments, removing dependencies, using waits properly, and running tests in isolation can help resolve flaky tests.

For your next read

Dive deeper with these related posts!

Choosing the right monitoring tools: Guide for Tech Teams
07 Min. Read

Choosing the right monitoring tools: Guide for Tech Teams

RabbitMQ vs. Kafka: When to use what and why?
09 Min. Read

RabbitMQ vs. Kafka: When to use what and why?

CI/CD tools showdown: Is Jenkins still the best choice?
09 Min. Read

CI/CD tools showdown: Is Jenkins still the best choice?

bottom of page