Processing Count based transformation in 4 nodes

parag.s.27 · Post by **parag.s.27** » Fri Sep 04, 2009 12:41 am

Hi All,

I am having a requirement where I need to count the number of records and do a math processing on the basis of even numbers, odd numbers etc. Now as we know that in multi node architecture the counts will always be initiated from 1 for all the partitions. But the problem is I need to do it in the transformer without adding any extra stage, because it is a requirement from client.

Now one approach I tried was to set the Transformer to partition type as "ENTIRE" and then constraint on @PARTITIONNUM = 1 and then all the records will give me correct count. But this result in sacrificing the parallel processing as per my client which I do not understand why.

Second approach was to use NextSurrogateKey(), But this is also not consistent because the surrogate key is never generated in a contiguous sequence. For one partition the values start from 1 where as for another partition the values start from 1000.

So after going through many posts finally I got some pointers from Vincent McBurney's post where he sugeested, something like this: -

Code: Select all

1. Define a Stage Variable svCounter and initiate it with value = @PARTITIONNUM-@NUMPARTITIONS+1.
2. Now the same stage variable be incremented with logic as svCounter = svCounter+@NUMPARTITONS.

The above mentioned code will generate a series of even and odd numbers. But the problem is it does not give correct result always. Because my server has 4 nodes and if number of records are not divisible by 4 then each partition will have different number of records and the count will not be correct in the above mentioned code. What I mean is If I am having 30 records, then the count after the application of above logic comes out to be 34. Because the records are divided in 4 partitions in 9, 6, 6, and 9 respectively hence the logic does not give correct result. I also tried different algorithms, but no proper result.

So the summary is I am not able to get what is required. Can any one help in this case. I am not sure whether multi node processing can hamper such a basic use of stage variables.

ray.wurlod · Post by **ray.wurlod** » Fri Sep 04, 2009 2:38 am

There is no way you can force there to be the same number of rows on each of four nodes if the total number of rows is not a multiple of 4.

ArndW · Post by **ArndW** » Fri Sep 04, 2009 2:42 am

How about defining your transform stage to execute "sequential"?

intelcom · Post by **intelcom** » Fri Sep 04, 2009 2:42 am

If you need a unique counter across all partitions with Transformer the correct syntax would be

@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + 1

intelcom · Post by **intelcom** » Fri Sep 04, 2009 2:46 am

If you need a unique counter across all partitions with Transformer the correct syntax would be

@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + 1

parag.s.27 · Post by **parag.s.27** » Fri Sep 04, 2009 4:34 am

@Arndw: Yes thats true hence instead of executing the Transformer in Sequential mode I am running it on Partition method as Entire and then I am constraining on @PARTITIONNUM = 0, 1, 2 or 3 (that is any one partition). In case of Entire Partition, the 30 records will be replicated across all the partitions hence in transformer, there will be in all 120 records. Now I can simply constraint on particular Partition data.

Intelcom: I'll try and let everyone know in this forum.

Thanks to all of you.

DSXchange

Processing Count based transformation in 4 nodes

Processing Count based transformation in 4 nodes

Re: Processing Count based transformation in 4 nodes

Re: Processing Count based transformation in 4 nodes

Re: Processing Count based transformation in 4 nodes