Reading & Writing using different configuration file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Reading & Writing using different configuration file

Post by nagarjuna »

I have 2 jobs

Job 1 : I have created a dataset : dataset_8node using 8 node configuration file .While creating it sorted & hash partitioned on Key1

Job 2 : I am joining the dataset_8node with a oracle table

Code: Select all

      dataset_8node   ----> same partition Sort ( Don't sort previously sorted ) 

                                         |
                                         V

                                    JOIN STAGE ( same partition on both input links )  ------> output
                                         ^
                                         |
                                                     
              Oracle table ---> hash partition & sort on Key1 .
Suppose I am running the job2 on 4 nodes , will there be any unexpected results ?

Looking for your inputs .Thanks in advance
Nag
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Here I created dataset_8node using 8 node configuration file & reading in a job with 4 node . However , I have given same partition while reading in job 2 . So , Here we have 2 input links to join , 1 input link is created by 8 node & another input link is created by 4 node . I am curious to know how it works.
Nag
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

When you read a Data Set and the configuration file you're using is not the one with which the Data Set was written, a temporary configuration file is used by the copy operator that reads from the Data Set.

You can achieve the same with the -x option of the orchadmin command.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Thanks for your response Ray . I understand that the dataset will be read on 8 node because of temp config file that you talked about .
Now My question is how the other link of the join works ? The other link is reading from a oracle table , hash partitioning on key1 . This operation takes place on 4 or 8 node ??
Nag
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That partitioning is occurring on the input link of the Join stage and therefore will use the job's current configuration file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

I have executed the job & it looks like other input link to join stage ( Read from oracle table ) is also running on 8 node as opposed to 4-node . Rest of the downstages after join are running on 4 node . Any idea why this is happening ?
Nag
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How do you know that?

Dump the score to learn definitively which operators are processing on which nodes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

I have checked the Job monitor & in that I found 8 - instances for sort before join's 2nd input link ( reading from oracle table & sorting ) .

let us suppose , It's should run on 4 node ...Then I think join operation won't work properly as one input link ( dataset ) runs on 8 node & another input runs on 4 node . please note that i have mentioned same partitioning on both the input links of join stage .

Any thoughts on this ?
Nag
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can always use node pools to force things to run on the nodes you require.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Ray , I understand that we can constrain a stage to run a particular node by defining node pool constraints ...Here my question is how input2 to the join is running on 8 node even after specifying APT_CONFIG_FILE to 4 node in the job parameters .
Nag
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Was the Data Set that feeds it written with an 8-node configuration file?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Yes the dataset is created on 8 node . While reading from that dataset , it is reading 8 node ( even though job executed on 4 node as there is temp config file that you mentioned ) .

I am wondering why the other link to the join is also executed on 8 node till join stage ( Oracle --> sort ( 8 instances ) ---> 2nd input of join stage .
Nag
Post Reply