Processing Count based transformation in 4 nodes
Posted: Fri Sep 04, 2009 12:41 am
Hi All,
I am having a requirement where I need to count the number of records and do a math processing on the basis of even numbers, odd numbers etc. Now as we know that in multi node architecture the counts will always be initiated from 1 for all the partitions. But the problem is I need to do it in the transformer without adding any extra stage, because it is a requirement from client.
Now one approach I tried was to set the Transformer to partition type as "ENTIRE" and then constraint on @PARTITIONNUM = 1 and then all the records will give me correct count. But this result in sacrificing the parallel processing as per my client which I do not understand why.
Second approach was to use NextSurrogateKey(), But this is also not consistent because the surrogate key is never generated in a contiguous sequence. For one partition the values start from 1 where as for another partition the values start from 1000.
So after going through many posts finally I got some pointers from Vincent McBurney's post where he sugeested, something like this: -
The above mentioned code will generate a series of even and odd numbers. But the problem is it does not give correct result always. Because my server has 4 nodes and if number of records are not divisible by 4 then each partition will have different number of records and the count will not be correct in the above mentioned code. What I mean is If I am having 30 records, then the count after the application of above logic comes out to be 34. Because the records are divided in 4 partitions in 9, 6, 6, and 9 respectively hence the logic does not give correct result. I also tried different algorithms, but no proper result.
So the summary is I am not able to get what is required. Can any one help in this case. I am not sure whether multi node processing can hamper such a basic use of stage variables.
I am having a requirement where I need to count the number of records and do a math processing on the basis of even numbers, odd numbers etc. Now as we know that in multi node architecture the counts will always be initiated from 1 for all the partitions. But the problem is I need to do it in the transformer without adding any extra stage, because it is a requirement from client.
Now one approach I tried was to set the Transformer to partition type as "ENTIRE" and then constraint on @PARTITIONNUM = 1 and then all the records will give me correct count. But this result in sacrificing the parallel processing as per my client which I do not understand why.
Second approach was to use NextSurrogateKey(), But this is also not consistent because the surrogate key is never generated in a contiguous sequence. For one partition the values start from 1 where as for another partition the values start from 1000.
So after going through many posts finally I got some pointers from Vincent McBurney's post where he sugeested, something like this: -
Code: Select all
1. Define a Stage Variable svCounter and initiate it with value = @PARTITIONNUM-@NUMPARTITIONS+1.
2. Now the same stage variable be incremented with logic as svCounter = svCounter+@NUMPARTITONS.
So the summary is I am not able to get what is required. Can any one help in this case. I am not sure whether multi node processing can hamper such a basic use of stage variables.