Page 1 of 1

Change Capture with 1.5 million rows

Posted: Thu Feb 12, 2009 10:10 am
by vercama
Hi all,
I've never used the Change Capture stage in a PX, but I know that it can be used to extract the differences between two tables, in particular the T1-T2 (probably it would need to run twice, T1-T2 and T2-T1, to get all the differences). What about performances? I had to check whether or not I can use this stage for two compatible tables with even 1.5 million rows, with differences may be of 150 rows only. Do you think it's reasonable to use this stage in this case?

Thanks,
Marco

Posted: Thu Feb 12, 2009 12:12 pm
by kris007
You can use Change capture stage without any problem as long as you partition and sort the data. You could also use Merge Stage and collect the rejects in a different link.

Posted: Thu Feb 12, 2009 2:17 pm
by ray.wurlod
It's reasonable (to use Change Capture stage), and you only need one pass.

Performances?

Posted: Fri Feb 13, 2009 2:25 am
by vercama
But what about performances with so many rows? The current solution is that if the count() in the two tables are different of at least one row, then target table is truncated and then everything is loaded from scratch.

Posted: Fri Feb 13, 2009 4:23 am
by ray.wurlod
1.5 million is not a lot of rows for a parallel job. Go for it.