using of schema file in a parallel job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsguy
Participant
Posts: 34
Joined: Thu Mar 09, 2006 10:37 am

using of schema file in a parallel job

Post by dsguy »

Hi

Please let me know the difference in using a metadata definition VS a Schema file in PX while creating builtop.
Does DS job execute in Parallel if we use a schema file??

Thanks in Advance
Regards,
Girish
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Mode of execution is defined in each stage and governed by the Config file. So what ever you use, the other settings in each stage's Advance tab should take care of Partitions.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
dsguy
Participant
Posts: 34
Joined: Thu Mar 09, 2006 10:37 am

Post by dsguy »

Thanks for your reply Kumar.
U meant we can partion based on the config file irrespective of using a schema file???
Can u let me know few more differences as i need to prepare a presentation.

Thanks,
Girish
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Degree of parallelism is defined by CONFIG file. Even if you specified as Entire partition and Crated only single node in config file, the data flow will be only in single stream.
If you created 10 Logical nodes with single CPU and if you mention RoundRobin type, you can expect 10 steams to flow parallely.
Schema file option helps you to assign the metadata at run time with the help of RCP.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
wfis
Premium Member
Premium Member
Posts: 70
Joined: Wed Feb 28, 2007 2:38 am
Location: India

Post by wfis »

Hi Kumar, one related query here.
While using the schema file we dont get to see the column definitions in any of the stages(since we dont load it explicitly for obvious reasons). So lets say, if i want to use Hash partitoning for a join stage in the job which uses schema files and rcp, how do i do it?
This question came up because you mentioned partitioning isn't dependent on the usage/non-usage of schema file.

Thanks.

kumar_s wrote:Degree of parallelism is defined by CONFIG file. Even if you specified as Entire partition and Crated only single node in config file, the data flow will be only in single stream.
If you created 10 Logical nodes with single CPU and if you mention RoundRobin type, you can expect 10 steams to flow parallely.
Schema file option helps you to assign the metadata at run time with the help of RCP.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Auto partition is chosed by default. Where Datastage decides the best suitable partition based on the down stream stages and its operation.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Only it's not always best. It's the one that will work in all circumstances.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Datastage find it best for immediate downstream operator, but as noted, it may not be best configuration for the whole picture.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply