Column comparison of two fixed length files of same layout

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ugandar123
Participant
Posts: 2
Joined: Sat Aug 25, 2012 1:48 am
Location: Chennai

Column comparison of two fixed length files of same layout

Post by ugandar123 »

Hi all,

I would like to compare all the fields of two fixed width files (field by field comparison) and the output should be having Yes or No for all those fields.

Example:
Input File1
111xyzalpha
222abcbeta

Input File2
111xyzalpha
222cdegama

Output
111111YxyzxyzYalphaalphaY
222222YabccdeNbetagamaN

The input columns might be changed often but the layout of both the files will always be the same and the output also should be changed accordingly. Could you please help me out how to get this scenario without affecting the job.
ugandar
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard. Read the file records as a single field, add line numbers, use a Difference stage to identify any difference, and join on line numbers.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Probably not as easy as that, Ray. The difference stage only identifies that two compared records with identical key-values are different, but not where they are. And when the column-names in both input datasets are identical, only the values of the after-dataset are output.

So to set column-based diff-values I see basically one option:

Change the column-names on at least one of the input-datasets so that all columns to be compared are distinguishable. Join on the key-column, and send before and after values to a transformer.
Have a Stage-variable per column, compare the column-values in an If-Then-Else construct, set the column-based diff-value accordingly and concatenate the result with the before- and after-values from input.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
harishkumar.upadrasta
Participant
Posts: 18
Joined: Tue Dec 25, 2012 10:39 pm
Location: Detroit,MI

Post by harishkumar.upadrasta »

Hi,

You can do it like this, as you said they are fixed files, so split the data accordingly based on the field lengths and sort the data in both the links[ source and reference] based on the same order of field keys, now this makes the data comparision easy. In the Left link create a "Dummy" field with value "Y" now Use a Lookup stage and set it to continue to find the perfect matches. If you find "Y" in the final output for the "DUMMY" field then all the columns had a perfect match, if not you will have it as Blank which you can mark as "N" as per your requirement.
Harish
Post Reply