Compare

kab123 · Post by **kab123** » Mon Feb 07, 2005 9:50 am

Hi All,

How can we just compare two files...and find the differences...
No Inserts and Updates needed...
Any Clue..?

Thanks

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Mon Feb 07, 2005 10:00 am

Given the information you supplied, the DOS command COMP can be the best option.

To get refined answers, you can let us know the attributes of the file and the your requirement of using the compare option.

kab123 · Post by **kab123** » Mon Feb 07, 2005 10:05 am

The file is having abt 4 million records... and about 10 to 15 attributes..
Is there any way to compare such a big file in DataStage..?

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Mon Feb 07, 2005 10:10 am

If you have a unique key, you can try a 'full outer join' option of the merge stage.

If merge blows out, you can use 'type file1 > input.txt' and 'type file2 >> input.txt' and then send the input.txt via a sort routine. The result can then be checked for 'before-after records' to compare.

Otherwise, you can load them into a temp tables to (outer) join.

In DataStage, you can use (hash) files to hold the key and checksum (crc32) value for before and after to identify the difference.

You can choose the option that suits you best.

kab123 · Post by **kab123** » Mon Feb 07, 2005 10:24 am

"If merge blows out, you can use 'type file1 > input.txt' and 'type file2 >> input.txt' and then send the input.txt via a sort routine. The result can then be checked for 'before-after records' to compare."

Can you give me a littlebit detail on how the above option can be achieved..?

Thanks

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Mon Feb 07, 2005 10:39 am

The idea is to concatenate both files one below other and sort them on the key. You can then use staging variables for the fields in the current record to compare with the previous ecord.

Maybe you can prefix each record with the source file name so that after you sort by the filename + key, you can aggregate to get the 'first' file name & record, 'last' fil name and record, count() for each key. So this will give you an idea of when the record first appeared.

So if the 'first' file name = file2 then insert
if 'second' file name = file1 then delete
if 'first' file name = file1 and 'last' file name = file2 then update

Or you can use hash-files and load both file's key into it to decide the change.

kab123 · Post by **kab123** » Mon Feb 07, 2005 10:49 am

Thanks Sai..