Node map constraint

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ravij
Premium Member
Premium Member
Posts: 170
Joined: Mon Oct 10, 2005 7:04 am
Location: India

Node map constraint

Post by ravij »

Hi,


I am running a job with stages like seqfile--->SurKeyGen--->Peek with 1 lakh records. When I define 1 node in a 'Node Map Constraint' its giving 1 lakh records as output but when I define 2 nodes its giving the 2 lakh records as output.
My partitioning type is Entire.

Why its giving double of the source records I don't understand.

Any help can be appreciated.
Thanks in advance
Ravi
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi Rajiv,
Pls go through the given manual and try to understand the concepts and types of Partitioning. Also you can do a search before raising an issues, which will almose clear all of you basic doubts.
If you have node specification as 2, and the partition is entire, the entire data is made to produce in both the nodes. i.e., 1lakh in the 2 nodes = 2lakhs. This type of partition is used if you do a lookup and that too in MPP system (unless required otherwise).
You can specify the partiton type as roundrobin, which will ensure the number of records per node is almost same.
Hash partition, diveds the number of records as per the hasing algorithm over the given key.

-Kumar
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In particular, learn from your reading that Entire partitioning puts all rows onto all nodes.

That is the main point from Kumar's posting that will aid your understanding.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ravij
Premium Member
Premium Member
Posts: 170
Joined: Mon Oct 10, 2005 7:04 am
Location: India

Rows varrying in Round Robin

Post by ravij »

Hi Kumar,

What u said is right. when I give Round Robin its giving same no. of records in the output.

But in the Round Robin the out put is varrying like below. I am providing the log results here.
When I select 2 nodes in SK Gen Stage n 1 node in Peek stage the Results I am getting in a single Peek like given below.
No. of records per partition is 10.
InRow Gen node2,node3
in Peek: node2

Peek_6,0: Sur_Key:0 first_name:John last_name:Parker gender:M birth_date:1979-04-24 income: 0087228.46 state:MA
Peek_6,0: Sur_Key:2 first_name:William last_name:Mandella gender:M birth_date:1962-04-07 income: 0040676.94 state:CA
Peek_6,0: Sur_Key:4 first_name:Frank last_name:Chalmers gender:M birth_date:1969-12-10 income: 0004881.94 state:NY
Peek_6,0: Sur_Key:6 first_name:Seymour last_name:Glass gender:M birth_date:1960-08-18 income: 0051531.56 state:NJ
Peek_6,0: Sur_Key:8 first_name:John last_name:Boone gender:M birth_date:1964-04-16 income: 0042729.03 state:CO
Peek_6,0: Sur_Key:10 first_name:William last_name:Tell gender:M birth_date:1974-07-13 income: 0021008.45 state:SD
Peek_6,0: Sur_Key:12 first_name:Frank last_name:Sinatra gender:M birth_date:1984-06-12 income: 0082552.55 state:OH
Peek_6,0: Sur_Key:14 first_name:Seymour last_name:Smith gender:M birth_date:1977-09-27 income: 0029352.31 state:IL
Peek_6,0: Sur_Key:16 first_name:John last_name:Calvin gender:M birth_date:1961-11-30 income: 0025966.39 state:FL
Peek_6,0: Sur_Key:18 first_name:William last_name:Claybourne gender:M birth_date:1961-03-16 income: 0052160.89 state:TX


When I define 2 nodes in Peek It should display the equal no. of records in each node right? but the output is varrying as I have given the results below.

In SurKey Gen: node2,node3
In Peek: node2,node3

Peek_6,0: Sur_Key:0 first_name:John last_name:Parker gender:M birth_date:1979-04-24 income: 0087228.46 state:MA
Peek_6,0: Sur_Key:2 first_name:William last_name:Mandella gender:M birth_date:1962-04-07 income: 0040676.94 state:CA
Peek_6,0: Sur_Key:4 first_name:Frank last_name:Chalmers gender:M birth_date:1969-12-10 income: 0004881.94 state:NY
Peek_6,0: Sur_Key:6 first_name:Seymour last_name:Glass gender:M birth_date:1960-08-18 income: 0051531.56 state:NJ
Peek_6,0: Sur_Key:8 first_name:John last_name:Boone gender:M birth_date:1964-04-16 income: 0042729.03 state:CO
Peek_6,0: Sur_Key:10 first_name:William last_name:Tell gender:M birth_date:1974-07-13 income: 0021008.45 state:SD




Peek_6,1: Sur_Key:1 first_name:Susan last_name:Calvin gender:F birth_date:1967-12-24 income: 0091312.42 state:IL
Peek_6,1: Sur_Key:3 first_name:Ann last_name:Claybourne gender:F birth_date:1960-10-29 income: 0061774.32 state:FL
Peek_6,1: Sur_Key:5 first_name:Jane last_name:Studdock gender:F birth_date:1962-02-24 income: 0075990.80 state:TX
Peek_6,1: Sur_Key:7 first_name:Laura last_name:Engels gender:F birth_date:1981-12-07 income: 0015280.31 state:KY



Peek_6,0: Sur_Key:12 first_name:Frank last_name:Sinatra gender:M birth_date:1984-06-12 income: 0082552.55 state:OH
Peek_6,0: Sur_Key:14 first_name:Seymour last_name:Smith gender:M birth_date:1977-09-27 income: 0029352.31 state:IL
Peek_6,0: Sur_Key:16 first_name:John last_name:Calvin gender:M birth_date:1961-11-30 income: 0025966.39 state:FL
Peek_6,0: Sur_Key:18 first_name:William last_name:Claybourne gender:M birth_date:1961-03-16 income: 0052160.89 state:TX




Peek_6,1: Sur_Key:9 first_name:Susan last_name:Sarandon gender:F birth_date:1966-06-08 income: 0081319.09 state:ND
Peek_6,1: Sur_Key:11 first_name:Ann last_name:Dillard gender:F birth_date:1969-02-21 income: 0004552.65 state:MI
Peek_6,1: Sur_Key:13 first_name:Jane last_name:Austin gender:F birth_date:1985-03-26 income: 0019820.10 state:MA
Peek_6,1: Sur_Key:15 first_name:Laura last_name:Parker gender:F birth_date:1972-11-16 income: 0087834.98 state:CA
Peek_6,1: Sur_Key:17 first_name:Susan last_name:Mandella gender:F birth_date:1960-03-31 income: 0080394.55 state:NY
Peek_6,1: Sur_Key:19 first_name:Ann last_name:Chalmers gender:F birth_date:1970-10-01 income: 0075071.43 state:NJ

How its happening like this? Can I know the process behind this?

any answer can be appreciated.
thanks in advance.
Ravi
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

Hi Ravi

When Surkey Gen runs with 2 nodes and peek with 1 node then repartition occurs as a result all the records are processed by the single node in the peek. When you view the records in the peek, node 0 will have 1 lakh records.

When Surkey Gen runs with 2 nodes and peek with 2 nodes with partition method as round robin each node has 50,000 records.

Configure the peek stage to output all records from all partitions.

-Balaji S.R
ravij
Premium Member
Premium Member
Posts: 170
Joined: Mon Oct 10, 2005 7:04 am
Location: India

Post by ravij »

Hi Balaji,

Code: Select all


Configure the peek stage to output all records from all partitions

What u said I didn't understand. U said to configure the peek stage to output all records from all partitions right, how to do it? can u suggest me?

thanks in advance
Ravi
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi,

Balaji asking you to set the property of

Code: Select all

All Partition = True
in peek stage.
I guess you still havent gone thorough the documentation about partition.

-Kumar
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

Hi

Go to Stage- Properties Page of Peek stage.

In the Rows category there is an option "All Records(After Skip") which you need to set it as true.

In the Partitions category there is an option "All partitions" set this to true.

-Balaji S.R
Post Reply