Compare files - Generic Job

DSFreddie · Post by **DSFreddie** » Thu Dec 13, 2012 11:14 pm

Hi All,

Thanks for visiting this post.

I am looking forward to get some thoughts/inputs on a scenario where I need to build a Generic Datastage job to compare 2 files with same layout (field to field comparison) & generate a file similar to below.

e.g:

Code: Select all

File1   File1     File2    File2   Field1(matched) Field2(matched)
Field1  Field2   Field1  Field2    (Y/N)              (Y/N)

Can you pls help me with some thoughts/inputs around this on how we can accomplish this in Datastage.

Thanks Much
Freddie

ray.wurlod · Post by **ray.wurlod** » Fri Dec 14, 2012 12:12 am

If the number of fields is unknown and arbitrary, the only generic solution is to read each line as a single VarChar, parse based on a known delimiter character (could be a job parameter) and produce the output in that fashion. The output will also need to be a single VarChar in order for there to be a generic solution, and it may prove to unweildy to create a header. Join the files based on input line number, which can be generated by the Sequential File stage. Partition on line number (or run in sequential mode). Use a Sort Merge collector to write the output file.

DSFreddie · Post by **DSFreddie** » Fri Dec 14, 2012 1:53 am

Hi Ray, Thanks for your quick response.

As I read ur reply it looks like passing files dynamically & doing a field by field comparison is faesible in Datastage. But, since I am not a premium user, i was not able to read the latter part of your reply.

Hope you can help me with some more details on the solution approach you mentioned.

Thanks much,
Freddie

ray.wurlod · Post by **ray.wurlod** » Fri Dec 14, 2012 3:40 am

You're up to 100 posts near enough. Don't you think it's time you contributed your 30c/day and got yourself a premium membership? Premium membership is DSXchange's funding model - this money all goes to hosting and bandwidth costs for the site.

DSFreddie · Post by **DSFreddie** » Fri Dec 14, 2012 9:49 pm

Sure Ray, I will plan to take a premium membership.

Anyone can suggest me with some ways to accomplish this.

Thanks,
Freddie

ArndW · Post by **ArndW** » Sat Dec 15, 2012 7:11 am

If you plan on comparing all fields you can use a CDC stage to at least detect which rows are new/deleted/changed given a key. I've used this type of generic job often, but all I want to know is if there is a difference, not which columns have changed.
If you really need to compute which of the columns has changed you would need to either use schemas for the files and pass that to a hand-crafted generic stage, or combine the columns to one string with separators and program the comparison yourself, either using stage variables in a transform stage or your own custom buildop.

harishkumar.upadrasta · Wed Dec 26, 2012 11:12 am

Create a script to create schema file which will import the structure of the files to both the links and enable rcp..