Compare files - Generic Job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DSFreddie
Participant
Posts: 130
Joined: Wed Nov 25, 2009 2:16 pm

Compare files - Generic Job

Post by DSFreddie »

Hi All,

Thanks for visiting this post.

I am looking forward to get some thoughts/inputs on a scenario where I need to build a Generic Datastage job to compare 2 files with same layout (field to field comparison) & generate a file similar to below.

e.g:

Code: Select all

File1   File1     File2    File2   Field1(matched) Field2(matched)
Field1  Field2   Field1  Field2    (Y/N)              (Y/N)
Can you pls help me with some thoughts/inputs around this on how we can accomplish this in Datastage.

Thanks Much
Freddie
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If the number of fields is unknown and arbitrary, the only generic solution is to read each line as a single VarChar, parse based on a known delimiter character (could be a job parameter) and produce the output in that fashion. The output will also need to be a single VarChar in order for there to be a generic solution, and it may prove to unweildy to create a header. Join the files based on input line number, which can be generated by the Sequential File stage. Partition on line number (or run in sequential mode). Use a Sort Merge collector to write the output file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSFreddie
Participant
Posts: 130
Joined: Wed Nov 25, 2009 2:16 pm

Post by DSFreddie »

Hi Ray, Thanks for your quick response.

As I read ur reply it looks like passing files dynamically & doing a field by field comparison is faesible in Datastage. But, since I am not a premium user, i was not able to read the latter part of your reply.

Hope you can help me with some more details on the solution approach you mentioned.

Thanks much,
Freddie
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You're up to 100 posts near enough. Don't you think it's time you contributed your 30c/day and got yourself a premium membership? Premium membership is DSXchange's funding model - this money all goes to hosting and bandwidth costs for the site.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSFreddie
Participant
Posts: 130
Joined: Wed Nov 25, 2009 2:16 pm

Post by DSFreddie »

Sure Ray, I will plan to take a premium membership.

Anyone can suggest me with some ways to accomplish this.

Thanks,
Freddie
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you plan on comparing all fields you can use a CDC stage to at least detect which rows are new/deleted/changed given a key. I've used this type of generic job often, but all I want to know is if there is a difference, not which columns have changed.
If you really need to compute which of the columns has changed you would need to either use schemas for the files and pass that to a hand-crafted generic stage, or combine the columns to one string with separators and program the comparison yourself, either using stage variables in a transform stage or your own custom buildop.
harishkumar.upadrasta
Participant
Posts: 18
Joined: Tue Dec 25, 2012 10:39 pm
Location: Detroit,MI

Post by harishkumar.upadrasta »

Create a script to create schema file which will import the structure of the files to both the links and enable rcp..
Harish
Post Reply