Page 1 of 1

writing my first PX job

Posted: Wed Jun 30, 2004 7:55 am
by acool
Hello everyone,

I am a fresh man on PX. I need to rewrite a Datastage job in PX. It is a simple job, but the volume is hugh. The source only has 3 colums, but 300 million rows. I only have 2 simple lookups. But one lookup hash file has 10 million records.

When I run this job in Datastage, the speed is about 1000k/s, which means I need about 10 hours to complete the job. How fast do you think PX can reach? What kind of strategy do I use in PX to maximize speed and stability?

I am very new to PX, and the only reference I have now is the PDF file that comes with Datastage, so any suggestion will be VERY helpful to me?

Thanks you!

Posted: Wed Jun 30, 2004 8:07 am
by richdhan
Hi,

First read the parallel job developer's guide. The second chapter has information on pipeline parallelism and partition parallelism.

Read the Manager's guide to setting up the configuration file which allows you to configure multinodes. Configuring multinodes helps you to achieve pipeline parallelism.

Setting the different partitioning for diffferent stages helps you to achieve partition parallelism.

If you can combine pipeline and partition parallelism together I hope the results would be better.

HTH
--Rich

Posted: Wed Jun 30, 2004 3:56 pm
by ray.wurlod
The biggest difference for you will be getting your head around the fact that hashed files play no part in reference lookups in parallel jobs.
Instead, you have a choice of three stage types (Lookup, Join and Merge) depending on exactly what you need to do; each refers to an in-memory dataset when in use.

Posted: Wed Jun 30, 2004 9:24 pm
by vigneshra
As Ray said, no such hashed file stage in PX. You can use either Merge, lookup or join to do your lookup needs. Also you can assign nodes depending on your PX server configuration to achieve partition and pipeline parallelism in which each node will do the operation parallely which eventually improves the performance.

Re: writing my first PX job

Posted: Thu Feb 24, 2005 2:35 pm
by rakeshcv
READ THE MANUAL
acool wrote:Hello everyone,

I am a fresh man on PX. I need to rewrite a Datastage job in PX. It is a simple job, but the volume is hugh. The source only has 3 colums, but 300 million rows. I only have 2 simple lookups. But one lookup hash file has 10 million records.

When I run this job in Datastage, the speed is about 1000k/s, which means I need about 10 hours to complete the job. How fast do you think PX can reach? What kind of strategy do I use in PX to maximize speed and stability?

I am very new to PX, and the only reference I have now is the PDF file that comes with Datastage, so any suggestion will be VERY helpful to me?

Thanks you!

Posted: Thu Feb 24, 2005 3:30 pm
by ray.wurlod
Ascential has published a benchmark that achieved 300GB/hour (projected).

Re: writing my first PX job

Posted: Fri Feb 25, 2005 3:39 pm
by chalasaniamith
If u had huge volume better use the merge so the performance will increase.Its better if u can go through the documentation on parallel extender u will know about pipeline and partion paralllism.......

If you can combine pipeline and partition parallelism together I hope the results would be better. And then apply merge state it will be so nice.Join stage is there but not for huge volume...