Comparing Two Datasets

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
iamnagus
Participant
Posts: 48
Joined: Wed Sep 29, 2004 1:16 am

Comparing Two Datasets

Post by iamnagus »

I have two input files here,
Before_file1 & After_file2(sequential file stages).

Now I want to compare these two files and to get the MODIFIED, NEW-INSERTED & DELETED records.

I know this problem in Parallel Jobs, by using Change Caprute stage.
But I dont know in server addition.

In my current project it is necessary..

Can anyone help me?

Thanks in Advance
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Load one of the files into a hashed file, the use that hashed file as a reference for the other. If the lookup fails it is a "N"ew one, if it succeeds then it is a "M"odified one.
Capturing deleted records this way is a bit more difficult, you could always load both files into hashed files and do lookups, or combine both files, sort and aggregate. A lot depends upon the sizes of your files.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

For deletes, just reverse the roles and check again. A hashed file miss would be a deleted record then.
-craig

"You can never have too many knives" -- Logan Nine Fingers
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

Depending on how much data is involved with each row in the sequential files, follow Arnd's suggestion about using the hashed file, but only put the PK fields in there, this will make your hashed file smaller and possibly speed things up as well. Once again, this depends on the amount of data you are dealing with. Just a slight modification to his suggestion.

Hope that helps!
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

Depending on how much data is involved with each row in the sequential files, follow Arnd's suggestion about using the hashed file, but only put the PK fields in there, this will make your hashed file smaller and possibly speed things up as well. Once again, this depends on the amount of data you are dealing with. Just a slight modification to his suggestion.

Hope that helps!
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
Post Reply