Comparing Data and Spotting Differences

Dmytro Prydatko

We all remember a game we played as children, “Spot 10 differences”, where you have two versions of the same picture and you must find the differences between these two pictures. A test engineer’s job too often involves comparing data represented in a number of different ways: as text, numbers, binary, or graphic.

Even if the data is scarce, we may still make mistakes when checking and comparing. For instance, one can easily read a text where letters in words change places, often without even noticing the difference. An important condition is that the first and last letters are in proper places, but the letters in the middle are hardly noticed, so when we skim a text, we simply guess them. (You can find more detailed information in this article):

Acocdring to a sduty of one Elngish unirvestiy, it dnoesn’t mettar in wihch odrer the leettrs in a wrod are giong. Olny the frist and the lsat must be in place. The rest may be in cmolpete disodrer, the text will still be unredtsood wihtuot a porlbem.

There was a similar experiment with images, where people were shown game cards, some of which were ordinary, but some had mixed-up colors – red spades, black hearts – and asked to name the card. Not everyone noticed the difference. Only when attention was specifically drawn to the colors, or more time was given to view the card, did the participants understand that something was wrong. Some even said that they were confused and no longer sure, which color was right.

I would like to share some practical advice on how to make data comparison and analysis more efficient.

Files and folders

To compare the content of folders or synchronize them, you can use Unreal Commander – a free two-panel file manager for Windows. It will deal with this task easily, and a large number of settings and options allow making the comparison flexible.

To compare two folders in Unreal Commander, choose: Commands > Synchronize dirs...

Comparing Data - Folders and Files

Folder comparison in Unreal Commander

 

In addition to folder comparison, Unreal Commander has an option of comparing files based on their contents. There are two ways: comparing files as text or as binary data.

To compare the contents of two files in Unreal Commander, you have to select two files and then go to the menu and choose: File > Compare by content...

Comparing Data - Text

Comparing files as text in Unreal Commander

 

If you choose the ‘Binary’ option, the files will be compared as binary data.

Comparing Data - Binary

Comparing binary files in Unreal Commander

 

Documents

The free Notepad++ application is known to many people. Notepad++ is a handy substitute for the Windows standard pad application and offers many possibilities: working with large files, highlighting the syntax of different languages, markup formatting of XML / JSON / JavaScript, line sorting and many more. The basic options are easily extended by adding plugins. One such plugin is Compare-plugin, which allows comparing text files. Additional bonus is that the syntax highlighting is in this case saved:

Comparing Data - XML-files

Comparing two XML-files in Notepad++ using Compare plugin

 

When documents are represented not as text files, but as MS Word documents, comparison of text and binary data will not clearly show the difference between the documents. To track changes in Word documents, there is a Review mode, where all changes made to the text by different authors are highlighted. However, if you have two versions of a document without the review mode on and you want to see the difference, you can go to Review > Compare > Compare... and get the changes highlighted this way.

Comparing Data - Documents

Comparing documents in MS Word

 

Lists

MS Excel can help if you have to compare two lists that have a certain unique identifier. To do so, you have to select the column with the id in both lists and then go to Home > Conditional Formatting > Duplicate Values.

If there is too much data, you can then shape each list into a table (select the data range and then press Ctrl + L), so the autofilter in the table header will contain filtering according to colors.

Comparing Data - Lists

Comparing lists in MS Excel

 

Images

Resemble.js service can help you easily compare two images. The generated image will use color to show the difference between the initial two. If you have to process arrays of images and you are familiar with programming in JavaScript and HTML5 or Node.js, you can write a small application to process and generate images with reports on the differences using the Resemble.js library.

Comparing Data - Images

Comparing images with Resemble.js service

 

Another interesting tool to compare images is the ImageMagick library set. It is a set of utilities that work from the command line and allow various manipulations with images. ImageMagick also includes the Compare library, which provides the possibility to compare images by generating an image with differences, similar to Resemble.js described above.

Data Comparing in a Nutshell

Comparison of even small volumes of data can cause difficulties when the overall similarity leads to mistakes. Application of special tools increases the result precision and decreases the likelihood of mistakes. When the data volume is substantial, such tools are the only way to do the job efficiently, accurately, and in the shortest possible time. The article offers some practical tips on comparing data represented as images, lists, files, and documents.

Need an innovative and reliable IT consultant?

Let's connect

Contact us

Thank you for reaching out to Sigma Software! Please fill the form below. Our team will contact you shortly.