using of schema file in a parallel job

dsguy · Post by **dsguy** » Wed Mar 07, 2007 4:58 am

Hi

Please let me know the difference in using a metadata definition VS a Schema file in PX while creating builtop.
Does DS job execute in Parallel if we use a schema file??

Thanks in Advance
Regards,
Girish

kumar_s · Post by **kumar_s** » Wed Mar 07, 2007 5:11 am

Mode of execution is defined in each stage and governed by the Config file. So what ever you use, the other settings in each stage's Advance tab should take care of Partitions.

dsguy · Post by **dsguy** » Wed Mar 07, 2007 5:19 am

Thanks for your reply Kumar.
U meant we can partion based on the config file irrespective of using a schema file???
Can u let me know few more differences as i need to prepare a presentation.

Thanks,
Girish

kumar_s · Post by **kumar_s** » Wed Mar 07, 2007 5:34 am

Degree of parallelism is defined by CONFIG file. Even if you specified as Entire partition and Crated only single node in config file, the data flow will be only in single stream.
If you created 10 Logical nodes with single CPU and if you mention RoundRobin type, you can expect 10 steams to flow parallely.
Schema file option helps you to assign the metadata at run time with the help of RCP.

wfis · Post by **wfis** » Wed Mar 07, 2007 6:22 am

Hi Kumar, one related query here.
While using the schema file we dont get to see the column definitions in any of the stages(since we dont load it explicitly for obvious reasons). So lets say, if i want to use Hash partitoning for a join stage in the job which uses schema files and rcp, how do i do it?
This question came up because you mentioned partitioning isn't dependent on the usage/non-usage of schema file.

Thanks.

kumar_s wrote:Degree of parallelism is defined by CONFIG file. Even if you specified as Entire partition and Crated only single node in config file, the data flow will be only in single stream.
If you created 10 Logical nodes with single CPU and if you mention RoundRobin type, you can expect 10 steams to flow parallely.
Schema file option helps you to assign the metadata at run time with the help of RCP.

kumar_s · Post by **kumar_s** » Wed Mar 07, 2007 6:28 am

Auto partition is chosed by default. Where Datastage decides the best suitable partition based on the down stream stages and its operation.

ray.wurlod · Post by **ray.wurlod** » Wed Mar 07, 2007 1:17 pm

Only it's not always best. It's the one that will work in all circumstances.

kumar_s · Post by **kumar_s** » Wed Mar 07, 2007 5:55 pm

Datastage find it best for immediate downstream operator, but as noted, it may not be best configuration for the whole picture.