Hi All,
How can we just compare two files...and find the differences...
No Inserts and Updates needed...
Any Clue..?
Thanks
Compare
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
If you have a unique key, you can try a 'full outer join' option of the merge stage.
If merge blows out, you can use 'type file1 > input.txt' and 'type file2 >> input.txt' and then send the input.txt via a sort routine. The result can then be checked for 'before-after records' to compare.
Otherwise, you can load them into a temp tables to (outer) join.
In DataStage, you can use (hash) files to hold the key and checksum (crc32) value for before and after to identify the difference.
You can choose the option that suits you best.
If merge blows out, you can use 'type file1 > input.txt' and 'type file2 >> input.txt' and then send the input.txt via a sort routine. The result can then be checked for 'before-after records' to compare.
Otherwise, you can load them into a temp tables to (outer) join.
In DataStage, you can use (hash) files to hold the key and checksum (crc32) value for before and after to identify the difference.
You can choose the option that suits you best.
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
The idea is to concatenate both files one below other and sort them on the key. You can then use staging variables for the fields in the current record to compare with the previous ecord.
Maybe you can prefix each record with the source file name so that after you sort by the filename + key, you can aggregate to get the 'first' file name & record, 'last' fil name and record, count() for each key. So this will give you an idea of when the record first appeared.
So if the 'first' file name = file2 then insert
if 'second' file name = file1 then delete
if 'first' file name = file1 and 'last' file name = file2 then update
Or you can use hash-files and load both file's key into it to decide the change.
Maybe you can prefix each record with the source file name so that after you sort by the filename + key, you can aggregate to get the 'first' file name & record, 'last' fil name and record, count() for each key. So this will give you an idea of when the record first appeared.
So if the 'first' file name = file2 then insert
if 'second' file name = file1 then delete
if 'first' file name = file1 and 'last' file name = file2 then update
Or you can use hash-files and load both file's key into it to decide the change.