Writing a LocalFileComparator with Threshold for Flutter Golden Tests
At Rows, we use Flutter Desktop to build our native desktop app for macOS and Windows. To ensure everything looks as expected in Rows’s native desktop app, we leverage so-called “golden tests”. These are visual tests where a widget is compared to a screenshot of a previous run. If there is any difference — even if it’s only a pixel — the test will fail.
Recently, we found some discrepancies in golden tests when running them locally and in the pipeline and wrote a
LocalFileComparator with a configurable threshold to ignore irrelevant differences between these environments.
Here’s how we did it.
Testing our App
At Rows, we have different developers using different operating systems and laptops. There’s Windows and macOS, there’s M1 MacBooks and Intel MacBooks, etc. These platforms result in several differences when running golden tests since they render things differently from each other.
As a first step to solve these inconsistencies, we moved over to Docker for running golden tests. It wasn’t easy, but it did bear fruits since now we could run golden tests in macOS and Windows without having to worry about operating system differences.
This worked until running the tests in the pipeline.
Pipeline Problems with Golden Tests
We use GitHub Actions for running tests, creating releases, and publishing new versions of our desktop app.
After introducing Docker to run our golden tests, we moved over to
macos-latest) as the platform to run our tests on. This would, in theory, allow us to visual run tests consistently across engineers’ laptops and the pipeline, since they’re running on Docker.
But… in practice, this didn’t happen. We started seeing some failing tests:
Golden test failing when running in our pipeline.
Notice how the diff percentage is very low: 0.01%.
We did some digging and eventually discovered that this is an architecture issue. GitHub Actions runs x86 Ubuntu machines, however, most of our engineers are using M1 MacBooks.
Since GitHub Actions does not support ARM machines and the difference in our failing golden tests was never greater than 0.02%, we decided to compromise and add a threshold of 0.02% in golden tests.
We felt safe with this value since it is high enough for these architecture-specific differences, but too low for most visually significant changes. This means that tests should only fail when there are meaningful differences in screenshots.
Writing a File Comparator with Threshold
As it turns out,
LocalFileComparator does not support setting a threshold, so we had to implement it ourselves.
To achieve that, we created a new
LocalFileComparatorWithThreshold that extends
LocalFileComparator and overrides its
compare method to take the threshold into account:
The main part of this implementation is the
compare method. It uses the default
compareLists function to compare the current image with the baseline.
It returns a [object Object], which includes, among other fields, a
passed flag indicating whether the two images are exactly equal (i.e., pixel-by-pixel), and a
diffPercent field containing the percentage of pixels in which the two images differ.
The only difference from
compare method is that we added the conditional to check if the
false and the
diffPercent is lower than the threshold we set. In that case, we override the default behavior and return
true, meaning that the
flutter_test framework should consider the test as successful.
In that case, we also print a message to warn the user that the test only passed due to having the threshold. Otherwise, it would have failed. Finally, after creating our new file comparator, we need to configure our tests to use it.
Switching over to new implementation
According to the documentation of [object Object], we can use a file named
flutter_test_config.dart to customize how tests are run.
This file should be placed in the directory in which tests with the new comparator will run. If you want to use the comparator on all tests, make sure this file is on the tests’ directory root.
flutter_test’s code, you can see that the
goldenFileComparator is set globally, so this means we can easily override it to use our comparator:
Here we check if the
goldenFileComparator is of type
LocalFileComparator and throw an exception if it isn’t. This is required because we need the
basedir field which is available in the
LocalFileComparator. If it isn’t present we should abort running the tests since it isn’t possible to run them properly.
If this precondition is met, we instantiate our file comparator and override the default
goldenFileComparator with our new implementation.
Afterward, we execute the test itself, by calling
testMain() and everything else proceeds as normal.
And that’s it! 🎉 From now on, golden tests will use our new file comparator with threshold and we’ll be able to skip over minor details!
The beta apps are available for macOS and Windows at rows.com/download.
We continue building the spreadsheet with superpowers. Get started today for free at rows.com.