Comparing Data and Spotting Differences
5 min read
We all remember a game we played as children, “Spot 10 differences”, where you have two versions of the same picture and you must find the differences between these two pictures. A test engineer’s job too often involves comparing data represented in a number of different ways: as text, numbers, binary, or graphic.
Even if the data is scarce, we may still make mistakes when checking and comparing. For instance, one can easily read a text where letters in words change places, often without even noticing the difference. An important condition is that the first and last letters are in proper places, but the letters in the middle are hardly noticed, so when we skim a text, we simply guess them. (You can find more detailed information in this article):
Acocdring to a sduty of one Elngish unirvestiy, it dnoesn’t mettar in wihch odrer the leettrs in a wrod are giong. Olny the frist and the lsat must be in place. The rest may be in cmolpete disodrer, the text will still be unredtsood wihtuot a porlbem.
There was a similar experiment with images, where people were shown game cards, some of which were ordinary, but some had mixed-up colors – red spades, black hearts – and asked to name the card. Not everyone noticed the difference. Only when attention was specifically drawn to the colors, or more time was given to view the card, did the participants understand that something was wrong. Some even said that they were confused and no longer sure, which color was right.
I would like to share some practical advice on how to make data comparison and analysis more efficient.
Files and folders
To compare the content of folders or synchronize them, you can use Unreal Commander – a free two-panel file manager for Windows. It will deal with this task easily, and a large number of settings and options allow making the comparison flexible.
To compare two folders in Unreal Commander, choose: Commands > Synchronize dirs…
Folder comparison in Unreal Commander
In addition to folder comparison, Unreal Commander has an option of comparing files based on their contents. There are two ways: comparing files as text or as binary data.
To compare the contents of two files in Unreal Commander, you have to select two files and then go to the menu and choose: File > Compare by content…
Comparing files as text in Unreal Commander
If you choose the ‘Binary’ option, the files will be compared as binary data.
Comparing binary files in Unreal Commander
Comparing two XML-files in Notepad++ using Compare plugin
When documents are represented not as text files, but as MS Word documents, comparison of text and binary data will not clearly show the difference between the documents. To track changes in Word documents, there is a Review mode, where all changes made to the text by different authors are highlighted. However, if you have two versions of a document without the review mode on and you want to see the difference, you can go to Review > Compare > Compare… and get the changes highlighted this way.
Comparing documents in MS Word
MS Excel can help if you have to compare two lists that have a certain unique identifier. To do so, you have to select the column with the id in both lists and then go to Home > Conditional Formatting > Duplicate Values.
If there is too much data, you can then shape each list into a table (select the data range and then press Ctrl + L), so the autofilter in the table header will contain filtering according to colors.
Comparing lists in MS Excel
Comparing images with Resemble.js service
Another interesting tool to compare images is the ImageMagick library set. It is a set of utilities that work from the command line and allow various manipulations with images. ImageMagick also includes the Compare library, which provides the possibility to compare images by generating an image with differences, similar to Resemble.js described above.
Data Comparing in a Nutshell
Comparison of even small volumes of data can cause difficulties when the overall similarity leads to mistakes. Application of special tools increases the result precision and decreases the likelihood of mistakes. When the data volume is substantial, such tools are the only way to do the job efficiently, accurately, and in the shortest possible time. The article offers some practical tips on comparing data represented as images, lists, files, and documents.
Dmytro has gained extensive experience over 17 years he has been working in IT. He has tried his hand as an Application and Database Developer, Business Analyst, Test Engineer (desktop, mobile, web), DevOps, Project Manager, Head of QA. Dmytro is glad to share his experience and knowledge with others.