Page 1 of 1

Generate Sequential Number using Parallel Transformer

Posted: Wed Mar 18, 2009 2:24 pm
by kashif007
Hi Guys

I want to generate sequential numbers in a parallel job with configuration file using 4 nodes. I have done the following posted by vmcburney in the transformer stage.

1) Created a stage variable for the counter named SVCounter.
with the value as "@PARTITIONNUM - @NUMPARTITIONS + 1".
2) Set the derivation of the stage variable named SVRow to "SVCounter + @NUMPARTITIONS".

Each instance will start at a different number, eg. -1, -2, -3, -4. When the counter is incremented each instance is increment by the number of partitions, eg. 4. This gives us a sequence in instance 1 of 1, 5, 9, 13... Instance 2 is 2, 6, 10, 14... etc

Can anyone suggest me the syntax of how to write the IF statement using the SVRow and SVCounter values. I want to generate sequential numbers no matter on which node the data has been split for processing.

Posted: Wed Mar 18, 2009 2:39 pm
by chulett
Curious why you wouldn't use the Surrogate Key stage for something like this, seeing as how it does all that automagically? :?

Posted: Wed Mar 18, 2009 2:44 pm
by kashif007
I would'nt want to disturb the current design, its better I do it in Transformer and most importantly I want to learn how to do this in a transformer. Running the transformer in sequential is giving me a drawback in performance. I am trying to process around 140 Million records in the job.

Posted: Wed Mar 18, 2009 3:00 pm
by Mike
Learn to use the surrogate key generator stage. No disturbance of the current design required... it's just a NextSurrogateKey function call in your transformer... and fewer keystrokes than creating those stage variables. :wink:

Mike

Posted: Wed Mar 18, 2009 3:20 pm
by ray.wurlod
You should not need an IF expression. Initialize the stage variable to the partition number, and increment by the number of partitions. This sequence is guarantee to be unique across all partitions.

Posted: Wed Mar 18, 2009 3:48 pm
by kashif007
The problem is that I don't get my sequence unique across all partitions. I did the following:

Stage Variable (SVCount) = @PARTITIONNUM - @NUMPARTITIONS + 1

Column (SVRow) = SVCount + @NUMPARTITIONS

So when I see SVRow in the output file, I get SVRow = 1 for the first 24287 records and then SVRow = 2 for the next 24287 records and so on for four nodes and the sequence again repeats itself. What I need are numbers as 1,2,3,4.......................n for each n rows produced in the outfile. Please correct me if I following a wrong notion.

Posted: Wed Mar 18, 2009 5:07 pm
by ray.wurlod
Initialize svCount to @PARTITIONNUM

Derive svCount as svCount + @NUMPARTITIONS

Derive column (or svRow) as svCount

Posted: Wed Mar 18, 2009 5:24 pm
by Kryt0n
kashif007 wrote: Stage Variable (SVCount) = @PARTITIONNUM - @NUMPARTITIONS + 1

Column (SVRow) = SVCount + @NUMPARTITIONS
You aren't incrementing either sv since PARTITIONNUM and NUMPARTITIONS are static values.

I would initialise SVRow to

@PARTITIONNUM - @NUMPARTITIONS + 1

then at each row do

SVRow = SVRow + @NUMPARTITIONS

Posted: Thu Mar 19, 2009 11:37 am
by kashif007
Thanks a bunch kryt0n and Ray, it actually works. I did a fumble in the initialization part of the stage variable. My bad. :x

Posted: Thu Mar 19, 2009 3:19 pm
by kashif007
Sorry Guys, Actually I ran into problems again. I was trying to run the job with different number of input records/dataset. The logic works well with limited data say around 900,000 input records but when I run the job using 1 million records or more, the sequence generation is broken and only one number "1000000" previals for all the remaining records after 1000000th record. Can you anyone please help me identify what went wrong in the logic.

Just to remind you I am using a 4 node config file and have done the following in the transformer.

Initialized the Stage Variable,
SVCount = @PARTITIONNUM - @NUMPARTITIONS + 1

Derived SVCount = SVCount + @NUMPARTITIONS

Created Output column called SeqNum = SVRow

Thanks