Issue in using @INROW function

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
HemaV
Participant
Posts: 63
Joined: Wed Jan 09, 2008 1:38 am
Location: Bangalore

Issue in using @INROW function

Post by HemaV »

Hi All,

I'm using the function

@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + 1

to generate sequence number but in middle my job is skipping sequence numbers. from 1 to 156 its coming correctly but after 156 its an issue as below:
0000155
0000156
0000158
0000159
0000160
0000162
0000163
0000164
0000166
0000167
0000168
0000170
0000171
0000172
0000174
0000175
0000176
0000178
0000179
0000180
0000182
0000183
0000184
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are the row counts identical on all partitions? The easiest way to determine this is to use the Monitor view in Director client, enabling display of instances.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

How many partitions are you running this with? Also, where did you determine that certain records were missing - could it be that the transform stage correctly outputs the computation (which I would expect) and that the missing records were caused elsewhere.

You could put a peek stage after the transform with the result of your calculation to see if that works.
HemaV
Participant
Posts: 63
Joined: Wed Jan 09, 2008 1:38 am
Location: Bangalore

Post by HemaV »

Incoming data count is unsure so rowcount on each partition wont be identical.
We are running using 4 node configuration file.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Assume you have a 4 node configuration, if you do a round-robin partitioning you will not have any 'holes'. If your partitioning results in an uneven number of records per row you will get 'holes'.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

With any partitioning scheme running on two or more nodes you are not guaranteed that there will be no holes...it's just the nature of assigning a partition-based calculated value in a multi-process environment. Round-robin will evenly distribute so long as the number of rows is a multiple of the number of nodes (and even then it may sometimes depend upon the distribution of the incoming data).

If you MUST have assignment without holes, run the stage sequentially or call a database sequence using Surrogate Key Generator.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

James, I would think that with a round-robin partitioning scheme you won't get any 'holes' in the number sequence.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

ONLY if you are round-robin partitioning a sequential source. If you round-robin an already-partitioned data source, you can get gaps in the assignment.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
HemaV
Participant
Posts: 63
Joined: Wed Jan 09, 2008 1:38 am
Location: Bangalore

Post by HemaV »

my job is having 2 transformers.

1st transfomer gets data directly from join stage, where using round robin partition with "@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + 1"
it's coming correctly.

2nd transformer i'm sending data from funnel stage and in that transformer if i use "@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) + 1" function with round robin partition then its not working.


Thanks,
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

As Ray suggested earlier in the thread, use the Monitor tool in Director to see how the data has been distributed among the nodes at the points in your job you are assigning the sequence number. If the distribution varies by more than one row from min to max, you will see gaps. If it varies by only one row, you may see gaps if it wasn't using round-robin from a sequential source.

If you are not allowed to have gaps in your sequence number assignment, run the transformer(s) in sequential mode so that you can be guaranteed a gapless assignment. Otherwise, you will run the risk of having gaps. Keep in mind that in most situations the need is for unique sequence number assignment, which that logic does provide even when executed in parallel mode.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply