Processing Count based transformation in 4 nodes

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
parag.s.27
Participant
Posts: 221
Joined: Fri Feb 17, 2006 3:38 am
Location: India
Contact:

Processing Count based transformation in 4 nodes

Post by parag.s.27 »

Hi All,

I am having a requirement where I need to count the number of records and do a math processing on the basis of even numbers, odd numbers etc. Now as we know that in multi node architecture the counts will always be initiated from 1 for all the partitions. But the problem is I need to do it in the transformer without adding any extra stage, because it is a requirement from client.

Now one approach I tried was to set the Transformer to partition type as "ENTIRE" and then constraint on @PARTITIONNUM = 1 and then all the records will give me correct count. But this result in sacrificing the parallel processing as per my client which I do not understand why.

Second approach was to use NextSurrogateKey(), But this is also not consistent because the surrogate key is never generated in a contiguous sequence. For one partition the values start from 1 where as for another partition the values start from 1000.

So after going through many posts finally I got some pointers from Vincent McBurney's post where he sugeested, something like this: -

Code: Select all

1. Define a Stage Variable svCounter and initiate it with value = @PARTITIONNUM-@NUMPARTITIONS+1.
2. Now the same stage variable be incremented with logic as svCounter = svCounter+@NUMPARTITONS.
The above mentioned code will generate a series of even and odd numbers. But the problem is it does not give correct result always. Because my server has 4 nodes and if number of records are not divisible by 4 then each partition will have different number of records and the count will not be correct in the above mentioned code. What I mean is If I am having 30 records, then the count after the application of above logic comes out to be 34. Because the records are divided in 4 partitions in 9, 6, 6, and 9 respectively hence the logic does not give correct result. I also tried different algorithms, but no proper result.

So the summary is I am not able to get what is required. Can any one help in this case. I am not sure whether multi node processing can hamper such a basic use of stage variables.
Thanks & Regards
Parag Saundattikar
Certified for Infosphere DataStage v8.0
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: Processing Count based transformation in 4 nodes

Post by ray.wurlod »

There is no way you can force there to be the same number of rows on each of four nodes if the total number of rows is not a multiple of 4.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

How about defining your transform stage to execute "sequential"?
intelcom
Premium Member
Premium Member
Posts: 25
Joined: Thu Feb 28, 2008 2:05 am

Re: Processing Count based transformation in 4 nodes

Post by intelcom »

If you need a unique counter across all partitions with Transformer the correct syntax would be

@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + 1
intelcom
Premium Member
Premium Member
Posts: 25
Joined: Thu Feb 28, 2008 2:05 am

Re: Processing Count based transformation in 4 nodes

Post by intelcom »

If you need a unique counter across all partitions with Transformer the correct syntax would be

@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + 1
parag.s.27
Participant
Posts: 221
Joined: Fri Feb 17, 2006 3:38 am
Location: India
Contact:

Post by parag.s.27 »

@Arndw: Yes thats true hence instead of executing the Transformer in Sequential mode I am running it on Partition method as Entire and then I am constraining on @PARTITIONNUM = 0, 1, 2 or 3 (that is any one partition). In case of Entire Partition, the 30 records will be replicated across all the partitions hence in transformer, there will be in all 120 records. Now I can simply constraint on particular Partition data.

Intelcom: I'll try and let everyone know in this forum.

Thanks to all of you.
Thanks & Regards
Parag Saundattikar
Certified for Infosphere DataStage v8.0
Post Reply