Page 1 of 1

Need a clarification

Posted: Thu Jun 23, 2011 10:23 pm
by Bicchu
Hi All,

I need this clarification from all of you. Thanks in advance. :D

My job design is:

Dataset -----> Transformer ------> Remove Duplicate ------> Dataset

In the input link of the transformer stage, I am performing a 'HASH' partioning.

In the input link of the Remove Duplicate I am using 'SAME' partioning.



Now, my question I got the information from my code reviewer that transformer stage is not capable of retening partioning in the output link.
It automatically converts the partioning to 'AUTO'. So, RD stage will have 'AUTO' partioning in the input link. Is that so?

2. RD is a key based stage, it is getting 'AUTO' partioning the input link (if my question 1 is correct). So, will DS optimize the partioning to 'HASH'?

I will be delighted if you all can throw some light on my doubts.

Re: Need a clarification

Posted: Thu Jun 23, 2011 11:53 pm
by ray.wurlod
Bicchu wrote:Hi All, I need this clarification from all of you.
So, you want 38309 replies. Is this correct?

Posted: Thu Jun 23, 2011 11:55 pm
by Bicchu
Sorry, for that line.

I just want what are the answers for my question.

Thanks,
Pratik

Re: Need a clarification

Posted: Thu Jun 23, 2011 11:57 pm
by ray.wurlod
Visit the stage properties of the Transformer stage and tell us whether its Propagate property is set to Set, Clear or Default. It's on the Advanced tab. And, if it's not Clear, then your code reviewer is wrong, wrong, wrong.

You can prove this by inspecting the score.

As to question 2, if the partitioning is set to (Auto) and the upstream stage executes in parallel, then the partitioning algorithm used will be Same.

Posted: Sat Jun 25, 2011 2:28 pm
by Bicchu
I had set that property to 'Propagate'

Posted: Sat Jun 25, 2011 5:01 pm
by ray.wurlod
Then your Remove Duplicates stage will have its partitioning set to Same which, because the Transformer stage is running using Hash as its partitioning algorithm, will mean that the Remove Duplicates stage will execute using Hash partitioning (the Same as that used in the Transformer stage).