Node map constraint

ravij · Post by **ravij** » Mon Dec 26, 2005 7:14 am

Hi,

I am running a job with stages like seqfile--->SurKeyGen--->Peek with 1 lakh records. When I define 1 node in a 'Node Map Constraint' its giving 1 lakh records as output but when I define 2 nodes its giving the 2 lakh records as output.
My partitioning type is Entire.

Why its giving double of the source records I don't understand.

Any help can be appreciated.
Thanks in advance

kumar_s · Post by **kumar_s** » Mon Dec 26, 2005 7:36 am

Hi Rajiv,
Pls go through the given manual and try to understand the concepts and types of Partitioning. Also you can do a search before raising an issues, which will almose clear all of you basic doubts.
If you have node specification as 2, and the partition is entire, the entire data is made to produce in both the nodes. i.e., 1lakh in the 2 nodes = 2lakhs. This type of partition is used if you do a lookup and that too in MPP system (unless required otherwise).
You can specify the partiton type as roundrobin, which will ensure the number of records per node is almost same.
Hash partition, diveds the number of records as per the hasing algorithm over the given key.

-Kumar

ray.wurlod · Post by **ray.wurlod** » Mon Dec 26, 2005 2:37 pm

In particular, learn from your reading that Entire partitioning puts all rows onto all nodes.

That is the main point from Kumar's posting that will aid your understanding.

ravij · Post by **ravij** » Tue Dec 27, 2005 12:00 am

Hi Kumar,

What u said is right. when I give Round Robin its giving same no. of records in the output.

But in the Round Robin the out put is varrying like below. I am providing the log results here.
When I select 2 nodes in SK Gen Stage n 1 node in Peek stage the Results I am getting in a single Peek like given below.
No. of records per partition is 10.
InRow Gen node2,node3
in Peek: node2

Peek_6,0: Sur_Key:0 first_name:John last_name:Parker gender:M birth_date:1979-04-24 income: 0087228.46 state:MA
Peek_6,0: Sur_Key:2 first_name:William last_name:Mandella gender:M birth_date:1962-04-07 income: 0040676.94 state:CA
Peek_6,0: Sur_Key:4 first_name:Frank last_name:Chalmers gender:M birth_date:1969-12-10 income: 0004881.94 state:NY
Peek_6,0: Sur_Key:6 first_name:Seymour last_name:Glass gender:M birth_date:1960-08-18 income: 0051531.56 state:NJ
Peek_6,0: Sur_Key:8 first_name:John last_name:Boone gender:M birth_date:1964-04-16 income: 0042729.03 state:CO
Peek_6,0: Sur_Key:10 first_name:William last_name:Tell gender:M birth_date:1974-07-13 income: 0021008.45 state:SD
Peek_6,0: Sur_Key:12 first_name:Frank last_name:Sinatra gender:M birth_date:1984-06-12 income: 0082552.55 state:OH
Peek_6,0: Sur_Key:14 first_name:Seymour last_name:Smith gender:M birth_date:1977-09-27 income: 0029352.31 state:IL
Peek_6,0: Sur_Key:16 first_name:John last_name:Calvin gender:M birth_date:1961-11-30 income: 0025966.39 state:FL
Peek_6,0: Sur_Key:18 first_name:William last_name:Claybourne gender:M birth_date:1961-03-16 income: 0052160.89 state:TX

When I define 2 nodes in Peek It should display the equal no. of records in each node right? but the output is varrying as I have given the results below.

In SurKey Gen: node2,node3
In Peek: node2,node3

Peek_6,0: Sur_Key:0 first_name:John last_name:Parker gender:M birth_date:1979-04-24 income: 0087228.46 state:MA
Peek_6,0: Sur_Key:2 first_name:William last_name:Mandella gender:M birth_date:1962-04-07 income: 0040676.94 state:CA
Peek_6,0: Sur_Key:4 first_name:Frank last_name:Chalmers gender:M birth_date:1969-12-10 income: 0004881.94 state:NY
Peek_6,0: Sur_Key:6 first_name:Seymour last_name:Glass gender:M birth_date:1960-08-18 income: 0051531.56 state:NJ
Peek_6,0: Sur_Key:8 first_name:John last_name:Boone gender:M birth_date:1964-04-16 income: 0042729.03 state:CO
Peek_6,0: Sur_Key:10 first_name:William last_name:Tell gender:M birth_date:1974-07-13 income: 0021008.45 state:SD

Peek_6,1: Sur_Key:1 first_name:Susan last_name:Calvin gender:F birth_date:1967-12-24 income: 0091312.42 state:IL
Peek_6,1: Sur_Key:3 first_name:Ann last_name:Claybourne gender:F birth_date:1960-10-29 income: 0061774.32 state:FL
Peek_6,1: Sur_Key:5 first_name:Jane last_name:Studdock gender:F birth_date:1962-02-24 income: 0075990.80 state:TX
Peek_6,1: Sur_Key:7 first_name:Laura last_name:Engels gender:F birth_date:1981-12-07 income: 0015280.31 state:KY

Peek_6,0: Sur_Key:12 first_name:Frank last_name:Sinatra gender:M birth_date:1984-06-12 income: 0082552.55 state:OH
Peek_6,0: Sur_Key:14 first_name:Seymour last_name:Smith gender:M birth_date:1977-09-27 income: 0029352.31 state:IL
Peek_6,0: Sur_Key:16 first_name:John last_name:Calvin gender:M birth_date:1961-11-30 income: 0025966.39 state:FL
Peek_6,0: Sur_Key:18 first_name:William last_name:Claybourne gender:M birth_date:1961-03-16 income: 0052160.89 state:TX

Peek_6,1: Sur_Key:9 first_name:Susan last_name:Sarandon gender:F birth_date:1966-06-08 income: 0081319.09 state:ND
Peek_6,1: Sur_Key:11 first_name:Ann last_name:Dillard gender:F birth_date:1969-02-21 income: 0004552.65 state:MI
Peek_6,1: Sur_Key:13 first_name:Jane last_name:Austin gender:F birth_date:1985-03-26 income: 0019820.10 state:MA
Peek_6,1: Sur_Key:15 first_name:Laura last_name:Parker gender:F birth_date:1972-11-16 income: 0087834.98 state:CA
Peek_6,1: Sur_Key:17 first_name:Susan last_name:Mandella gender:F birth_date:1960-03-31 income: 0080394.55 state:NY
Peek_6,1: Sur_Key:19 first_name:Ann last_name:Chalmers gender:F birth_date:1970-10-01 income: 0075071.43 state:NJ

How its happening like this? Can I know the process behind this?

any answer can be appreciated.
thanks in advance.

balajisr · Post by **balajisr** » Tue Dec 27, 2005 1:03 am

Hi Ravi

When Surkey Gen runs with 2 nodes and peek with 1 node then repartition occurs as a result all the records are processed by the single node in the peek. When you view the records in the peek, node 0 will have 1 lakh records.

When Surkey Gen runs with 2 nodes and peek with 2 nodes with partition method as round robin each node has 50,000 records.

Configure the peek stage to output all records from all partitions.

-Balaji S.R

ravij · Post by **ravij** » Tue Dec 27, 2005 3:56 am

Hi Balaji,

Code: Select all


Configure the peek stage to output all records from all partitions

What u said I didn't understand. U said to configure the peek stage to output all records from all partitions right, how to do it? can u suggest me?

thanks in advance

kumar_s · Post by **kumar_s** » Tue Dec 27, 2005 4:29 am

Hi,

Balaji asking you to set the property of

Code: Select all

All Partition = True

in peek stage.
I guess you still havent gone thorough the documentation about partition.

-Kumar

balajisr · Post by **balajisr** » Tue Dec 27, 2005 4:32 am

Hi

Go to Stage- Properties Page of Peek stage.

In the Rows category there is an option "All Records(After Skip") which you need to set it as true.

In the Partitions category there is an option "All partitions" set this to true.

-Balaji S.R

DSXchange

Node map constraint

Node map constraint

Rows varrying in Round Robin