Issue in using @INROW function

HemaV · Post by **HemaV** » Fri Jun 14, 2013 4:19 am

Hi All,

I'm using the function

@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + 1

to generate sequence number but in middle my job is skipping sequence numbers. from 1 to 156 its coming correctly but after 156 its an issue as below:
0000155
0000156
0000158
0000159
0000160
0000162
0000163
0000164
0000166
0000167
0000168
0000170
0000171
0000172
0000174
0000175
0000176
0000178
0000179
0000180
0000182
0000183
0000184

ray.wurlod · Post by **ray.wurlod** » Fri Jun 14, 2013 4:55 am

Are the row counts identical on all partitions? The easiest way to determine this is to use the Monitor view in Director client, enabling display of instances.

ArndW · Post by **ArndW** » Fri Jun 14, 2013 4:56 am

How many partitions are you running this with? Also, where did you determine that certain records were missing - could it be that the transform stage correctly outputs the computation (which I would expect) and that the missing records were caused elsewhere.

You could put a peek stage after the transform with the result of your calculation to see if that works.

HemaV · Post by **HemaV** » Fri Jun 14, 2013 5:04 am

Incoming data count is unsure so rowcount on each partition wont be identical.
We are running using 4 node configuration file.

ArndW · Post by **ArndW** » Fri Jun 14, 2013 5:27 am

Assume you have a 4 node configuration, if you do a round-robin partitioning you will not have any 'holes'. If your partitioning results in an uneven number of records per row you will get 'holes'.

jwiles · Post by **jwiles** » Fri Jun 14, 2013 8:28 am

With any partitioning scheme running on two or more nodes you are not guaranteed that there will be no holes...it's just the nature of assigning a partition-based calculated value in a multi-process environment. Round-robin will evenly distribute so long as the number of rows is a multiple of the number of nodes (and even then it may sometimes depend upon the distribution of the incoming data).

If you MUST have assignment without holes, run the stage sequentially or call a database sequence using Surrogate Key Generator.

Regards,

ArndW · Post by **ArndW** » Fri Jun 14, 2013 11:27 am

James, I would think that with a round-robin partitioning scheme you won't get any 'holes' in the number sequence.

jwiles · Post by **jwiles** » Fri Jun 14, 2013 12:31 pm

ONLY if you are round-robin partitioning a sequential source. If you round-robin an already-partitioned data source, you can get gaps in the assignment.

Regards,

HemaV · Post by **HemaV** » Thu Jun 20, 2013 6:47 am

my job is having 2 transformers.

1st transfomer gets data directly from join stage, where using round robin partition with "@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + 1"
it's coming correctly.

2nd transformer i'm sending data from funnel stage and in that transformer if i use "@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + 1" function with round robin partition then its not working.

Thanks,

jwiles · Post by **jwiles** » Thu Jun 20, 2013 8:41 am

As Ray suggested earlier in the thread, use the Monitor tool in Director to see how the data has been distributed among the nodes at the points in your job you are assigning the sequence number. If the distribution varies by more than one row from min to max, you will see gaps. If it varies by only one row, you may see gaps if it wasn't using round-robin from a sequential source.

If you are not allowed to have gaps in your sequence number assignment, run the transformer(s) in sequential mode so that you can be guaranteed a gapless assignment. Otherwise, you will run the risk of having gaps. Keep in mind that in most situations the need is for unique sequence number assignment, which that logic does provide even when executed in parallel mode.

Regards,