Using Change Capture Stage

opdas · Post by **opdas** » Mon Apr 03, 2006 10:40 pm

Hi,
I'm using change capture in my job. I'm testing with a small set of data.

I'm not geting the desired result when i try to run change capture stage in parallel mode, but it works fine and gives the desired result when run in sequential mode, and this doesn't solve my purpose as i'm going to run the job with a very large set of data.

kumar_s · Post by **kumar_s** » Mon Apr 03, 2006 10:42 pm

Change capture stage is build to run on parallel execution mode. It shouldnt give you any undesired result. But you could mention what is that you desire to get and what is the output from the stage. Perhaps you could concentrate on Partiton method used.

opdas · Post by **opdas** » Mon Apr 03, 2006 10:45 pm

Kumar,
You are right , its working when i hash parton both the input set....but am i doing the right thing????

kumar_s wrote:Change capture stage is build to run on parallel execution mode. It shouldnt give you any undesired result. But you could mention what is that you desire to get and what is the output from the stage. Perhaps you could concentrate on Partiton method used.

opdas · Post by **opdas** » Mon Apr 03, 2006 10:56 pm

is hash partitioning and sorting the data set before change capture are same? even sorting both set is not helping.....

ray.wurlod · Post by **ray.wurlod** » Tue Apr 04, 2006 1:46 am

It is vital that both Data Sets are identically partitioned on the comparison keys, so that valid comparisons will be performed. It is highly desirable that both Data Sets are identically sorted on the comparison keys, so that efficient use can be made of memory. What do you mean by "the desired result"? And what are you getting versus what the Change Capture stage is documented as generating?

kumar_s · Post by **kumar_s** » Tue Apr 04, 2006 3:43 am

opdas wrote: its working when i hash parton both the input set....but am i doing the right thing????

Yes you are doing the right thing.
As mentioned, both the dataset need to be sorted and corectly partitioned across nodes to get the correct result.