Round robin issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Round robin issue

Post by prasson_ibm »

Hi,
I have a job where source is dataset and after that i have a transformer where i am generatting a sequence number so on the input link of transformer i have explicitly defined round robin to avoid and holes in the sequence number,but its not working properly.

I am running job in 4 nodes configuration file and input data is splited like this on nodes in transformer.

Total Input Records:- 839093

In transformer(Round robin in input link) :-

Code: Select all

Node0 :- 209775
Node1 :- 209774
Node2 :- 209771
Node3 :- 209771
But according to Round robin logic it should be like this:-

Code: Select all

Node0 :- 209773
Node1 :- 209773
Node2 :- 209773
Node3 :- 209772
And hence i am getting holes in sequence number.Can anyone help me to understand the issue.
crystal_pup
Participant
Posts: 62
Joined: Thu Feb 08, 2007 6:01 am
Location: Pune

Post by crystal_pup »

Since the transformer stage is running in Parallel mode, how are you not getting duplicates in sequence numbers?

Even I remember from my experience wherein Round Robin didn't distribute equal no. of records in each node, even though the no. of records was perfectly divisible by no. of nodes.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,

Since distribution of records in transformer is not in order hence my formula to generated sequence number in transformer is missing some sequences and i dont know if there is any chance of getting duplicate in sequence number.

According to my understanding,datastage round robin partition should write first record to first node then to second and so on,please correct me if i am wrong.

http://pic.dhe.ibm.com/infocenter/iisin ... oning.html
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Can anyone help me with above issue. :cry:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

prasson_ibm wrote:Can anyone help me with above issue. :cry:
Possibly and, if they can, they will. "Pushing", which is what you are doing, is offensive.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

My apologies for that ray.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Perhaps you should reveal your formula and why you are concerned with 'holes' in the numbering.
-craig

"You can never have too many knives" -- Logan Nine Fingers
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,
I am concerned in gap in numbering because i am generating sequence number and it should generate contineous number without any gap.

Below is the formula which i am using for sequence number :-

Code: Select all

(@PARTITIONNUM+((@INROWNUM-1)* @NUMPARTITIONS))+1
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sequence as in surrogate key? Will a gap in the numbers actually hurt / break any other processing you'll be doing? The subject has been beat to death here but for one more turn in the barrel check an article like this one. It's a general article with specifics for CA-Ingres but very relevant, if you want you can skip down to the "Regard Gapless Number Sequences as Evil" section.

I'll leave the formula vetting for others.
-craig

"You can never have too many knives" -- Logan Nine Fingers
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,
In my case gap in the sequence will create problem in downstream processing but one thing I dont understand here is why round robin performing so weired here.
In case my source is sequence stage, then round robin in transformer is behaving perfectly as expected but only problem occurs when source is parallel stage and redistribution of data occurs in input link of transformer due to round robin partition. :cry:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Your actual question is why Round Robin partitioning algorithm has not distributed the rows absolutely equally. This probably has to do with the blocks/buffers in which data are transmitted - parallel jobs do not work row by row. Have you asked your official support provider? How wide are your rows?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi Ray,

If this works properly, I am planning to implement this in all jobs where rows transmission are from 1 million to 13 million.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

I don't know of any way to guarantee a sequential string of numbers while the job is run in parallel. As Ray said earlier, due to the buffering and other mechanisms involved the only way to generate sequenced numbers with no gaps is to not run the entire job in parallel.

The only way that might work is to consolidate to a single stream right before output and assign a number then. However that is really going to impact performance on a job with millions of rows.

That doesn't even begin to address other issues, like aborted job recovery, dropped rows due to database rejects, etc.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Post Reply