Page 1 of 1

Change Capture Stage - How to avoid Scratch space usage

Posted: Thu Mar 04, 2010 11:20 pm
by rleishman
I have a job that uses the Change Capture Stage to identify differences between two similar data sets. Both data sets are pre-sorted and partitioned on the change-key. Each data set has 1M rows. I want it to stream the output straight through, but it is writing the entire thing to scratch disk before outputting a single row.

Watching DataStage as it runs, the CC stage accepts inputs from the two pre-sorted sources simultaneuosly and at roughly the same speed, but it does not output ANYTHING until the two sources are completely consumed. After the inputs complete, it then pauses for 30 seconds or so and starts outputting the combined dataset.

Looking at the Scratch disk whilst this is happening, I can see it creating files. This doesn't seem necessary to me because it does not need to sort the data.

I suspect that it is unneccessarily sorting my data, but do not know how to make it stop. In the Partitioning tab of the CC Input tab, I am NOT checking the box that asks it to force a sort.

Question: Is this normal? If I ask it to force a sort, it does take a little bit longer, but does not use more temp space.

I want it to stream the output without writing it to scratch.

Posted: Thu Mar 04, 2010 11:47 pm
by ray.wurlod
Take a look at the score to see whether tsort operators and/or buffer operators are being inserted.

Add explicit Sort stages on the input links, with sort mode set to "don't sort, already sorted" (to prevent insertion of tsort operators) and with memory boosted as high as you can afford.

Posted: Fri Mar 05, 2010 7:01 am
by sohasaid
Also you can uncheck the APT_SORT_INSERTION environment variable.

Posted: Wed Mar 10, 2010 6:45 am
by rleishman
Both suggestions work perfectly. Thanks guys.