Page 1 of 1

CDC problem with partion

Posted: Tue Sep 21, 2010 1:06 am
by vinsashi
Hi,
In my job i am using CDC stage.I have to compare source data with target data if any new records then i should insert.I have desinged like this refence(less) Ref(less) target
! ! !(entire&sort on 4keycol)
source-Lookup- -----Loookp----(entire&sorton 4keycol)CDC---Trans(Surkey)---TargetDB


My source contains 2,000,000 records .if i run on 2node i am getting 4,000,000 records.and it taking 2 to 3 hr time.I should get only 2,00,000 records(new records only 2,000,000).Please help me if any solution for this one other than running on single node and perfomance tunning for this job.


Thanks
V....

Posted: Tue Sep 21, 2010 1:19 am
by rohithmuthyala
Hi,
As you are using the partition type as entire this problem is encountered, changing the partitioning type can help. I would suggest to go for hash partitioning on the key column combination.

Posted: Tue Sep 21, 2010 3:46 am
by priyadarshikunal

Code: Select all

        refence(less)   Ref(less)                     target
             !            !                             !(entire&sort on 4keycol)
source-Lookup-  -----Loookp----(entire&sorton 4keycol)CDC---Trans(Surkey)---TargetDB


use the code tags to preserve the formating.

Why are you using entire partitioning at all. In case of CDC you are just duplicating the data on each node and hence you are getting duplicates in output too (already mentioned 2 nodes hence twice the numbers). Use Hash partition so that the incoming data remains unique across the nodes and is key partitioned so that it finds the match if any in other link.