Performance with Change Capture Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Raamc
Premium Member
Premium Member
Posts: 87
Joined: Mon Aug 20, 2007 9:08 am

Performance with Change Capture Stage

Post by Raamc »

Hi,

I have a requirement to compare a table before update and after update.
I have 10 columns in each record and table is having around 20 millions of records.

I want to run a comparison over 20 millions of records.

In this case, is it feasible to use CHANGE CAPTURE stage to get the differences (insert/update/delete) records?

If I use CPC stage, how about the performance on 4 node config file?

Any rough idea how long it may take for comparisons over 20million records?
Thanks,
Raamc
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you are doing the updates then you already have that information without having to go to the actual table, right?

Performance guesstimates are impossible at this time - are you running on a single-CPU 486 with 100 other users or do you have a 32 Processor SMP system all to yourself? Is the database local or remote and how tuned is it? Even with those answers it is impossible to guess (are your records 10 bytes or 10000 bytes wide?)...
Raamc
Premium Member
Premium Member
Posts: 87
Joined: Mon Aug 20, 2007 9:08 am

Post by Raamc »

Thanks Arndw,

I am not doing the updates. Some other system will do the updates and that updates will flow into my system. I can find changes only by comparing before and after images.

Any performace tunning methods before using CHANGE CAPTURE stage. I am sorting the data other than this any other thing i need to do?

Witgout using the CPC stage, is there any other way to find update/delete/insert records?
Thanks,
Raamc
Raamc
Premium Member
Premium Member
Posts: 87
Joined: Mon Aug 20, 2007 9:08 am

Post by Raamc »

Thanks Arndw,

I am not doing the updates. Some other system will do the updates and that updates will flow into my system. I can find changes only by comparing before and after images.

Any performace tunning methods before using CHANGE CAPTURE stage. I am sorting the data other than this any other thing i need to do?

Witgout using the CPC stage, is there any other way to find update/delete/insert records?
Thanks,
Raamc
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

It is best to avoid having to do a full table scan - how about adding an indexed column which has a different value when the external application does an update than when DataStage does it, then you could do a select based on that column's value and only get changed records back to DataStage.
RAJARP
Participant
Posts: 147
Joined: Thu Dec 06, 2007 6:46 am
Location: Chennai

Post by RAJARP »

HI RAAMC,
There is an alternative for change capture stage.use a join stage with join type as 'full outer join'.Hash partition both the input links(sort,null first and ascending).
You can easily figure out the deletes,inserts and updates in the new data.use a tranformer stage following the join stage.In the constraint give the following constraints
Inserts-->when the old_link.Key=''
Deletes-->when the new_link.key=''
Updates-->when old_link.key<>'' and new_link.key<>''.
This would help you.Actually this would help you .I have tried this logic for two tables each containing more than 20 million records.

Regards,
Raja R P[/code][/quote]
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That doesn't sound substantially different than what the stage being discussed does. Have you run your 20M records both ways and compared the results?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This stage type is not available in server jobs, yet you marked the job type as server. Are you looking for a server job solution?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply