Regression Testing Processing Algorithms

This is a guest post by Albin Sunnanbo sharing experiences on regression testing.

On several occasions I have worked with systems that processed lots of work items with a fairly complicated algorithm. When doing a larger rewrite of such an algorithm you want to regression test your algorithm. Of course you have a bunch of unit tests and/or integration tests that maps to your requirements, but unit tests tends to test systems from one angle and you need to complement with other test methods to test it from other angles too. We have used copy of production data to run a comprehensive regression test with just a few hours of work.

Our systems had the following workflow

  1. Users or imports produces some kind of work item in the system, i.e. orders.
  2. There is a completion phase of the work where the user commits each work item and make the result final, i.e. sends the order.
  3. Once each item is final the system processes the work item and produces an output that is saved in a database before it is exported to another system.

We have successfully used the following approach to regression testing for those kind of algorithms.

Method for regression test

During our testing phase, before release, we make a copy of our production database (make sure to respect your data protection policies). We then create a copy of our output data table into another table, clear the output data table and trigger a recalculation of all items. During the recalculation I recommend having a coffee break, it is an important part of the process. When the coffee break is over the processing is usually completed and we simply compare the new processing result with the old values from production. There are usually some differences. We track down all differences and categorize them as intentional or unintentional. We then fix the unintentional differences and rerun our tests until we are happy.

Some notes about the comparison

To compare new and old result I usually do a select [important columns] from new and old processing results, order them by the same columns and just copy and paste the result into a regular diff tool like KDiff3 or WinMerge. You can also do a full outer join in the database. Either way, to compare successfully you have to strip out columns like sequence numbers and timestamps. If you have lots of intentional changes you need to create a filter to remove those false positives to not miss the real issues.


The key factors to make this kind of regression testing possible are

  • The processing algorithm works on one item at a time and produces one output for each input.
  • The processing does not start until each item is marked as final.
  • The the inputs for the algorithm are preserved in the database.
  • The outputs of the algorithm are preserved in the database.
  • The processing algorithm is deterministic.

If your system currently does not fulfil all points you can either tweak the process a bit to fit your system or tweak your system to make it possible to regression test the algorithm in the future.

Final words

We have found this kind of regression testing really valuable and we find three main kind of problems.

  • Unintentional regressions in our algorithm. After a major change we usually catch at least one or two bugs this way.
  • We get a really good run through of the system with as realistic data as possible. This catches some edge cases that you have not thought of when you write your regular sunny day unit tests. If you can process all your existing history without any major hiccups or deviations, chances are pretty high that you will make with future data too without major incidents.
  • Unintentional changes in requirements. This happens from time to time, you improve your requirements in one area and forget side effects in another area.

As as good side effect you are forced to test your database migration for the upcoming release.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.