Hi,
I am running a job with stages like seqfile--->SurKeyGen--->Peek with 1 lakh records. When I define 1 node in a 'Node Map Constraint' its giving 1 lakh records as output but when I define 2 nodes its giving the 2 lakh records as output.
My partitioning type is Entire.
Why its giving double of the source records I don't understand.
Any help can be appreciated.
Thanks in advance
Node map constraint
Moderators: chulett, rschirm, roy
Node map constraint
Ravi
Hi Rajiv,
Pls go through the given manual and try to understand the concepts and types of Partitioning. Also you can do a search before raising an issues, which will almose clear all of you basic doubts.
If you have node specification as 2, and the partition is entire, the entire data is made to produce in both the nodes. i.e., 1lakh in the 2 nodes = 2lakhs. This type of partition is used if you do a lookup and that too in MPP system (unless required otherwise).
You can specify the partiton type as roundrobin, which will ensure the number of records per node is almost same.
Hash partition, diveds the number of records as per the hasing algorithm over the given key.
-Kumar
Pls go through the given manual and try to understand the concepts and types of Partitioning. Also you can do a search before raising an issues, which will almose clear all of you basic doubts.
If you have node specification as 2, and the partition is entire, the entire data is made to produce in both the nodes. i.e., 1lakh in the 2 nodes = 2lakhs. This type of partition is used if you do a lookup and that too in MPP system (unless required otherwise).
You can specify the partiton type as roundrobin, which will ensure the number of records per node is almost same.
Hash partition, diveds the number of records as per the hasing algorithm over the given key.
-Kumar
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
In particular, learn from your reading that Entire partitioning puts all rows onto all nodes.
That is the main point from Kumar's posting that will aid your understanding.
That is the main point from Kumar's posting that will aid your understanding.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Rows varrying in Round Robin
Hi Kumar,
What u said is right. when I give Round Robin its giving same no. of records in the output.
But in the Round Robin the out put is varrying like below. I am providing the log results here.
When I select 2 nodes in SK Gen Stage n 1 node in Peek stage the Results I am getting in a single Peek like given below.
No. of records per partition is 10.
InRow Gen node2,node3
in Peek: node2
Peek_6,0: Sur_Key:0 first_name:John last_name:Parker gender:M birth_date:1979-04-24 income: 0087228.46 state:MA
Peek_6,0: Sur_Key:2 first_name:William last_name:Mandella gender:M birth_date:1962-04-07 income: 0040676.94 state:CA
Peek_6,0: Sur_Key:4 first_name:Frank last_name:Chalmers gender:M birth_date:1969-12-10 income: 0004881.94 state:NY
Peek_6,0: Sur_Key:6 first_name:Seymour last_name:Glass gender:M birth_date:1960-08-18 income: 0051531.56 state:NJ
Peek_6,0: Sur_Key:8 first_name:John last_name:Boone gender:M birth_date:1964-04-16 income: 0042729.03 state:CO
Peek_6,0: Sur_Key:10 first_name:William last_name:Tell gender:M birth_date:1974-07-13 income: 0021008.45 state:SD
Peek_6,0: Sur_Key:12 first_name:Frank last_name:Sinatra gender:M birth_date:1984-06-12 income: 0082552.55 state:OH
Peek_6,0: Sur_Key:14 first_name:Seymour last_name:Smith gender:M birth_date:1977-09-27 income: 0029352.31 state:IL
Peek_6,0: Sur_Key:16 first_name:John last_name:Calvin gender:M birth_date:1961-11-30 income: 0025966.39 state:FL
Peek_6,0: Sur_Key:18 first_name:William last_name:Claybourne gender:M birth_date:1961-03-16 income: 0052160.89 state:TX
When I define 2 nodes in Peek It should display the equal no. of records in each node right? but the output is varrying as I have given the results below.
In SurKey Gen: node2,node3
In Peek: node2,node3
Peek_6,0: Sur_Key:0 first_name:John last_name:Parker gender:M birth_date:1979-04-24 income: 0087228.46 state:MA
Peek_6,0: Sur_Key:2 first_name:William last_name:Mandella gender:M birth_date:1962-04-07 income: 0040676.94 state:CA
Peek_6,0: Sur_Key:4 first_name:Frank last_name:Chalmers gender:M birth_date:1969-12-10 income: 0004881.94 state:NY
Peek_6,0: Sur_Key:6 first_name:Seymour last_name:Glass gender:M birth_date:1960-08-18 income: 0051531.56 state:NJ
Peek_6,0: Sur_Key:8 first_name:John last_name:Boone gender:M birth_date:1964-04-16 income: 0042729.03 state:CO
Peek_6,0: Sur_Key:10 first_name:William last_name:Tell gender:M birth_date:1974-07-13 income: 0021008.45 state:SD
Peek_6,1: Sur_Key:1 first_name:Susan last_name:Calvin gender:F birth_date:1967-12-24 income: 0091312.42 state:IL
Peek_6,1: Sur_Key:3 first_name:Ann last_name:Claybourne gender:F birth_date:1960-10-29 income: 0061774.32 state:FL
Peek_6,1: Sur_Key:5 first_name:Jane last_name:Studdock gender:F birth_date:1962-02-24 income: 0075990.80 state:TX
Peek_6,1: Sur_Key:7 first_name:Laura last_name:Engels gender:F birth_date:1981-12-07 income: 0015280.31 state:KY
Peek_6,0: Sur_Key:12 first_name:Frank last_name:Sinatra gender:M birth_date:1984-06-12 income: 0082552.55 state:OH
Peek_6,0: Sur_Key:14 first_name:Seymour last_name:Smith gender:M birth_date:1977-09-27 income: 0029352.31 state:IL
Peek_6,0: Sur_Key:16 first_name:John last_name:Calvin gender:M birth_date:1961-11-30 income: 0025966.39 state:FL
Peek_6,0: Sur_Key:18 first_name:William last_name:Claybourne gender:M birth_date:1961-03-16 income: 0052160.89 state:TX
Peek_6,1: Sur_Key:9 first_name:Susan last_name:Sarandon gender:F birth_date:1966-06-08 income: 0081319.09 state:ND
Peek_6,1: Sur_Key:11 first_name:Ann last_name:Dillard gender:F birth_date:1969-02-21 income: 0004552.65 state:MI
Peek_6,1: Sur_Key:13 first_name:Jane last_name:Austin gender:F birth_date:1985-03-26 income: 0019820.10 state:MA
Peek_6,1: Sur_Key:15 first_name:Laura last_name:Parker gender:F birth_date:1972-11-16 income: 0087834.98 state:CA
Peek_6,1: Sur_Key:17 first_name:Susan last_name:Mandella gender:F birth_date:1960-03-31 income: 0080394.55 state:NY
Peek_6,1: Sur_Key:19 first_name:Ann last_name:Chalmers gender:F birth_date:1970-10-01 income: 0075071.43 state:NJ
How its happening like this? Can I know the process behind this?
any answer can be appreciated.
thanks in advance.
What u said is right. when I give Round Robin its giving same no. of records in the output.
But in the Round Robin the out put is varrying like below. I am providing the log results here.
When I select 2 nodes in SK Gen Stage n 1 node in Peek stage the Results I am getting in a single Peek like given below.
No. of records per partition is 10.
InRow Gen node2,node3
in Peek: node2
Peek_6,0: Sur_Key:0 first_name:John last_name:Parker gender:M birth_date:1979-04-24 income: 0087228.46 state:MA
Peek_6,0: Sur_Key:2 first_name:William last_name:Mandella gender:M birth_date:1962-04-07 income: 0040676.94 state:CA
Peek_6,0: Sur_Key:4 first_name:Frank last_name:Chalmers gender:M birth_date:1969-12-10 income: 0004881.94 state:NY
Peek_6,0: Sur_Key:6 first_name:Seymour last_name:Glass gender:M birth_date:1960-08-18 income: 0051531.56 state:NJ
Peek_6,0: Sur_Key:8 first_name:John last_name:Boone gender:M birth_date:1964-04-16 income: 0042729.03 state:CO
Peek_6,0: Sur_Key:10 first_name:William last_name:Tell gender:M birth_date:1974-07-13 income: 0021008.45 state:SD
Peek_6,0: Sur_Key:12 first_name:Frank last_name:Sinatra gender:M birth_date:1984-06-12 income: 0082552.55 state:OH
Peek_6,0: Sur_Key:14 first_name:Seymour last_name:Smith gender:M birth_date:1977-09-27 income: 0029352.31 state:IL
Peek_6,0: Sur_Key:16 first_name:John last_name:Calvin gender:M birth_date:1961-11-30 income: 0025966.39 state:FL
Peek_6,0: Sur_Key:18 first_name:William last_name:Claybourne gender:M birth_date:1961-03-16 income: 0052160.89 state:TX
When I define 2 nodes in Peek It should display the equal no. of records in each node right? but the output is varrying as I have given the results below.
In SurKey Gen: node2,node3
In Peek: node2,node3
Peek_6,0: Sur_Key:0 first_name:John last_name:Parker gender:M birth_date:1979-04-24 income: 0087228.46 state:MA
Peek_6,0: Sur_Key:2 first_name:William last_name:Mandella gender:M birth_date:1962-04-07 income: 0040676.94 state:CA
Peek_6,0: Sur_Key:4 first_name:Frank last_name:Chalmers gender:M birth_date:1969-12-10 income: 0004881.94 state:NY
Peek_6,0: Sur_Key:6 first_name:Seymour last_name:Glass gender:M birth_date:1960-08-18 income: 0051531.56 state:NJ
Peek_6,0: Sur_Key:8 first_name:John last_name:Boone gender:M birth_date:1964-04-16 income: 0042729.03 state:CO
Peek_6,0: Sur_Key:10 first_name:William last_name:Tell gender:M birth_date:1974-07-13 income: 0021008.45 state:SD
Peek_6,1: Sur_Key:1 first_name:Susan last_name:Calvin gender:F birth_date:1967-12-24 income: 0091312.42 state:IL
Peek_6,1: Sur_Key:3 first_name:Ann last_name:Claybourne gender:F birth_date:1960-10-29 income: 0061774.32 state:FL
Peek_6,1: Sur_Key:5 first_name:Jane last_name:Studdock gender:F birth_date:1962-02-24 income: 0075990.80 state:TX
Peek_6,1: Sur_Key:7 first_name:Laura last_name:Engels gender:F birth_date:1981-12-07 income: 0015280.31 state:KY
Peek_6,0: Sur_Key:12 first_name:Frank last_name:Sinatra gender:M birth_date:1984-06-12 income: 0082552.55 state:OH
Peek_6,0: Sur_Key:14 first_name:Seymour last_name:Smith gender:M birth_date:1977-09-27 income: 0029352.31 state:IL
Peek_6,0: Sur_Key:16 first_name:John last_name:Calvin gender:M birth_date:1961-11-30 income: 0025966.39 state:FL
Peek_6,0: Sur_Key:18 first_name:William last_name:Claybourne gender:M birth_date:1961-03-16 income: 0052160.89 state:TX
Peek_6,1: Sur_Key:9 first_name:Susan last_name:Sarandon gender:F birth_date:1966-06-08 income: 0081319.09 state:ND
Peek_6,1: Sur_Key:11 first_name:Ann last_name:Dillard gender:F birth_date:1969-02-21 income: 0004552.65 state:MI
Peek_6,1: Sur_Key:13 first_name:Jane last_name:Austin gender:F birth_date:1985-03-26 income: 0019820.10 state:MA
Peek_6,1: Sur_Key:15 first_name:Laura last_name:Parker gender:F birth_date:1972-11-16 income: 0087834.98 state:CA
Peek_6,1: Sur_Key:17 first_name:Susan last_name:Mandella gender:F birth_date:1960-03-31 income: 0080394.55 state:NY
Peek_6,1: Sur_Key:19 first_name:Ann last_name:Chalmers gender:F birth_date:1970-10-01 income: 0075071.43 state:NJ
How its happening like this? Can I know the process behind this?
any answer can be appreciated.
thanks in advance.
Ravi
Hi Ravi
When Surkey Gen runs with 2 nodes and peek with 1 node then repartition occurs as a result all the records are processed by the single node in the peek. When you view the records in the peek, node 0 will have 1 lakh records.
When Surkey Gen runs with 2 nodes and peek with 2 nodes with partition method as round robin each node has 50,000 records.
Configure the peek stage to output all records from all partitions.
-Balaji S.R
When Surkey Gen runs with 2 nodes and peek with 1 node then repartition occurs as a result all the records are processed by the single node in the peek. When you view the records in the peek, node 0 will have 1 lakh records.
When Surkey Gen runs with 2 nodes and peek with 2 nodes with partition method as round robin each node has 50,000 records.
Configure the peek stage to output all records from all partitions.
-Balaji S.R
Hi Balaji,
What u said I didn't understand. U said to configure the peek stage to output all records from all partitions right, how to do it? can u suggest me?
thanks in advance
Code: Select all
Configure the peek stage to output all records from all partitions
What u said I didn't understand. U said to configure the peek stage to output all records from all partitions right, how to do it? can u suggest me?
thanks in advance
Ravi
Hi,
Balaji asking you to set the property of in peek stage.
I guess you still havent gone thorough the documentation about partition.
-Kumar
Balaji asking you to set the property of
Code: Select all
All Partition = True
I guess you still havent gone thorough the documentation about partition.
-Kumar