Page 1 of 1

Compare

Posted: Mon Feb 07, 2005 9:50 am
by kab123
Hi All,

How can we just compare two files...and find the differences...
No Inserts and Updates needed...
Any Clue..?

Thanks

Posted: Mon Feb 07, 2005 10:00 am
by Sainath.Srinivasan
Given the information you supplied, the DOS command COMP can be the best option.

To get refined answers, you can let us know the attributes of the file and the your requirement of using the compare option.

Posted: Mon Feb 07, 2005 10:05 am
by kab123
The file is having abt 4 million records... and about 10 to 15 attributes..
Is there any way to compare such a big file in DataStage..?

Posted: Mon Feb 07, 2005 10:10 am
by Sainath.Srinivasan
If you have a unique key, you can try a 'full outer join' option of the merge stage.

If merge blows out, you can use 'type file1 > input.txt' and 'type file2 >> input.txt' and then send the input.txt via a sort routine. The result can then be checked for 'before-after records' to compare.

Otherwise, you can load them into a temp tables to (outer) join.

In DataStage, you can use (hash) files to hold the key and checksum (crc32) value for before and after to identify the difference.

You can choose the option that suits you best.

Posted: Mon Feb 07, 2005 10:24 am
by kab123
"If merge blows out, you can use 'type file1 > input.txt' and 'type file2 >> input.txt' and then send the input.txt via a sort routine. The result can then be checked for 'before-after records' to compare."

Can you give me a littlebit detail on how the above option can be achieved..?

Thanks

Posted: Mon Feb 07, 2005 10:39 am
by Sainath.Srinivasan
The idea is to concatenate both files one below other and sort them on the key. You can then use staging variables for the fields in the current record to compare with the previous ecord.

Maybe you can prefix each record with the source file name so that after you sort by the filename + key, you can aggregate to get the 'first' file name & record, 'last' fil name and record, count() for each key. So this will give you an idea of when the record first appeared.

So if the 'first' file name = file2 then insert
if 'second' file name = file1 then delete
if 'first' file name = file1 and 'last' file name = file2 then update

Or you can use hash-files and load both file's key into it to decide the change.

Posted: Mon Feb 07, 2005 10:49 am
by kab123
Thanks Sai..