Column comparison of two fixed length files of same layout

ugandar123 · Post by **ugandar123** » Mon Dec 17, 2012 11:16 pm

Hi all,

I would like to compare all the fields of two fixed width files (field by field comparison) and the output should be having Yes or No for all those fields.

Example:
Input File1
111xyzalpha
222abcbeta

Input File2
111xyzalpha
222cdegama

Output
111111YxyzxyzYalphaalphaY
222222YabccdeNbetagamaN

The input columns might be changed often but the layout of both the files will always be the same and the output also should be changed accordingly. Could you please help me out how to get this scenario without affecting the job.

ray.wurlod · Post by **ray.wurlod** » Tue Dec 18, 2012 1:12 pm

Welcome aboard. Read the file records as a single field, add line numbers, use a Difference stage to identify any difference, and join on line numbers.

BI-RMA · Post by **BI-RMA** » Thu Dec 20, 2012 2:17 am

Probably not as easy as that, Ray. The difference stage only identifies that two compared records with identical key-values are different, but not where they are. And when the column-names in both input datasets are identical, only the values of the after-dataset are output.

So to set column-based diff-values I see basically one option:

Change the column-names on at least one of the input-datasets so that all columns to be compared are distinguishable. Join on the key-column, and send before and after values to a transformer.
Have a Stage-variable per column, compare the column-values in an If-Then-Else construct, set the column-based diff-value accordingly and concatenate the result with the before- and after-values from input.

harishkumar.upadrasta · Wed Dec 26, 2012 5:14 am

Hi,

You can do it like this, as you said they are fixed files, so split the data accordingly based on the field lengths and sort the data in both the links[ source and reference] based on the same order of field keys, now this makes the data comparision easy. In the Left link create a "Dummy" field with value "Y" now Use a Lookup stage and set it to continue to find the perfect matches. If you find "Y" in the final output for the "DUMMY" field then all the columns had a perfect match, if not you will have it as Blank which you can mark as "N" as per your requirement.