Compare

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
kab123
Participant
Posts: 92
Joined: Tue May 18, 2004 4:05 pm

Compare

Post by kab123 »

Hi All,

How can we just compare two files...and find the differences...
No Inserts and Updates needed...
Any Clue..?

Thanks
Thanks
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Given the information you supplied, the DOS command COMP can be the best option.

To get refined answers, you can let us know the attributes of the file and the your requirement of using the compare option.
kab123
Participant
Posts: 92
Joined: Tue May 18, 2004 4:05 pm

Post by kab123 »

The file is having abt 4 million records... and about 10 to 15 attributes..
Is there any way to compare such a big file in DataStage..?
Thanks
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

If you have a unique key, you can try a 'full outer join' option of the merge stage.

If merge blows out, you can use 'type file1 > input.txt' and 'type file2 >> input.txt' and then send the input.txt via a sort routine. The result can then be checked for 'before-after records' to compare.

Otherwise, you can load them into a temp tables to (outer) join.

In DataStage, you can use (hash) files to hold the key and checksum (crc32) value for before and after to identify the difference.

You can choose the option that suits you best.
kab123
Participant
Posts: 92
Joined: Tue May 18, 2004 4:05 pm

Post by kab123 »

"If merge blows out, you can use 'type file1 > input.txt' and 'type file2 >> input.txt' and then send the input.txt via a sort routine. The result can then be checked for 'before-after records' to compare."

Can you give me a littlebit detail on how the above option can be achieved..?

Thanks
Thanks
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

The idea is to concatenate both files one below other and sort them on the key. You can then use staging variables for the fields in the current record to compare with the previous ecord.

Maybe you can prefix each record with the source file name so that after you sort by the filename + key, you can aggregate to get the 'first' file name & record, 'last' fil name and record, count() for each key. So this will give you an idea of when the record first appeared.

So if the 'first' file name = file2 then insert
if 'second' file name = file1 then delete
if 'first' file name = file1 and 'last' file name = file2 then update

Or you can use hash-files and load both file's key into it to decide the change.
kab123
Participant
Posts: 92
Joined: Tue May 18, 2004 4:05 pm

Post by kab123 »

Thanks Sai..
Thanks
Post Reply