I have two jobs. I am running the first job in four nodes and loading the intermediate results to dataset. Now i am using this dataset in my second job for loading the data to DB2. As per performance consideration, my team lead asked to run the job in 2 nodes, but the dataset which i am using has run on 4 nodes. How to come up with this solution.
I have made the preserve partition as clear and ran the job but i would like to know whether this approach is really good? Is there any another approach?
Please suggest me.!!
dataset using as intermediate stage b/w two jobs
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
A Data Set stage is ideal as the staging area between two jobs because it preserves the internal Data Set structure; internal formats, partitioning and sorting.
You will need to read the Data Set with a configuration file that is compatible with the one used when it was written. You may, therefore, need to re-run the writing job under the new, two-node, configuration.
You simply can not use a two-node configuration file to read a Data Set that was written with a four-node configuration file. If your team lead says you can, demand to know how.
You will need to read the Data Set with a configuration file that is compatible with the one used when it was written. You may, therefore, need to re-run the writing job under the new, two-node, configuration.
You simply can not use a two-node configuration file to read a Data Set that was written with a four-node configuration file. If your team lead says you can, demand to know how.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I'm not 100% about this and have no system to test it on but I think all the data will still be read if the dataset has been created on 4 node and you then read it on 2 node, but you will get a warning about the data being repartitioned, hence loosing performance. (And there will be a warning in the job so sequences may fail if you have success only set.
Regards,
Nick.
Nick.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Given the cost of repartitioning you may find restricting both jobs to 2 nodes is faster than 4 nodes followed by 2 nodes.
There may be a way to be clever with the configuration file so you have a node pool of 4 nodes for datasets and 2 nodes for everything else. Don't know enough about pooling to be sure. This may avoid having to rebuild the dataset to 2 nodes.
There may be a way to be clever with the configuration file so you have a node pool of 4 nodes for datasets and 2 nodes for everything else. Don't know enough about pooling to be sure. This may avoid having to rebuild the dataset to 2 nodes.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Re: dataset using as intermediate stage b/w two jobs
You can use the FROM NODES/FROM PARTITIONS variables...!!Madhu1981 wrote:I have two jobs. I am running the first job in four nodes and loading the intermediate results to dataset. Now i am using this dataset in my second job for loading the data to DB2. As per performance consideration, my team lead asked to run the job in 2 nodes, but the dataset which i am using has run on 4 nodes. How to come up with this solution.
I have made the preserve partition as clear and ran the job but i would like to know whether this approach is really good? Is there any another approach?
Please suggest me.!!
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: