Page 1 of 1

Issue in using @INROW function

Posted: Fri Jun 14, 2013 4:19 am
by HemaV
Hi All,

I'm using the function

@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + 1

to generate sequence number but in middle my job is skipping sequence numbers. from 1 to 156 its coming correctly but after 156 its an issue as below:
0000155
0000156
0000158
0000159
0000160
0000162
0000163
0000164
0000166
0000167
0000168
0000170
0000171
0000172
0000174
0000175
0000176
0000178
0000179
0000180
0000182
0000183
0000184

Posted: Fri Jun 14, 2013 4:55 am
by ray.wurlod
Are the row counts identical on all partitions? The easiest way to determine this is to use the Monitor view in Director client, enabling display of instances.

Posted: Fri Jun 14, 2013 4:56 am
by ArndW
How many partitions are you running this with? Also, where did you determine that certain records were missing - could it be that the transform stage correctly outputs the computation (which I would expect) and that the missing records were caused elsewhere.

You could put a peek stage after the transform with the result of your calculation to see if that works.

Posted: Fri Jun 14, 2013 5:04 am
by HemaV
Incoming data count is unsure so rowcount on each partition wont be identical.
We are running using 4 node configuration file.

Posted: Fri Jun 14, 2013 5:27 am
by ArndW
Assume you have a 4 node configuration, if you do a round-robin partitioning you will not have any 'holes'. If your partitioning results in an uneven number of records per row you will get 'holes'.

Posted: Fri Jun 14, 2013 8:28 am
by jwiles
With any partitioning scheme running on two or more nodes you are not guaranteed that there will be no holes...it's just the nature of assigning a partition-based calculated value in a multi-process environment. Round-robin will evenly distribute so long as the number of rows is a multiple of the number of nodes (and even then it may sometimes depend upon the distribution of the incoming data).

If you MUST have assignment without holes, run the stage sequentially or call a database sequence using Surrogate Key Generator.

Regards,

Posted: Fri Jun 14, 2013 11:27 am
by ArndW
James, I would think that with a round-robin partitioning scheme you won't get any 'holes' in the number sequence.

Posted: Fri Jun 14, 2013 12:31 pm
by jwiles
ONLY if you are round-robin partitioning a sequential source. If you round-robin an already-partitioned data source, you can get gaps in the assignment.

Regards,

Posted: Thu Jun 20, 2013 6:47 am
by HemaV
my job is having 2 transformers.

1st transfomer gets data directly from join stage, where using round robin partition with "@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + 1"
it's coming correctly.

2nd transformer i'm sending data from funnel stage and in that transformer if i use "@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + 1" function with round robin partition then its not working.


Thanks,

Posted: Thu Jun 20, 2013 8:41 am
by jwiles
As Ray suggested earlier in the thread, use the Monitor tool in Director to see how the data has been distributed among the nodes at the points in your job you are assigning the sequence number. If the distribution varies by more than one row from min to max, you will see gaps. If it varies by only one row, you may see gaps if it wasn't using round-robin from a sequential source.

If you are not allowed to have gaps in your sequence number assignment, run the transformer(s) in sequential mode so that you can be guaranteed a gapless assignment. Otherwise, you will run the risk of having gaps. Keep in mind that in most situations the need is for unique sequence number assignment, which that logic does provide even when executed in parallel mode.

Regards,