DSXchange

Posted: **Wed Jun 30, 2004 7:55 am**

Hello everyone,

I am a fresh man on PX. I need to rewrite a Datastage job in PX. It is a simple job, but the volume is hugh. The source only has 3 colums, but 300 million rows. I only have 2 simple lookups. But one lookup hash file has 10 million records.

When I run this job in Datastage, the speed is about 1000k/s, which means I need about 10 hours to complete the job. How fast do you think PX can reach? What kind of strategy do I use in PX to maximize speed and stability?

I am very new to PX, and the only reference I have now is the PDF file that comes with Datastage, so any suggestion will be VERY helpful to me?

Thanks you!

Posted: **Wed Jun 30, 2004 8:07 am**

Hi,

First read the parallel job developer's guide. The second chapter has information on pipeline parallelism and partition parallelism.

Read the Manager's guide to setting up the configuration file which allows you to configure multinodes. Configuring multinodes helps you to achieve pipeline parallelism.

Setting the different partitioning for diffferent stages helps you to achieve partition parallelism.

If you can combine pipeline and partition parallelism together I hope the results would be better.

HTH
--Rich

Posted: **Wed Jun 30, 2004 3:56 pm**

The biggest difference for you will be getting your head around the fact that hashed files play no part in reference lookups in parallel jobs.
Instead, you have a choice of three stage types (Lookup, Join and Merge) depending on exactly what you need to do; each refers to an in-memory dataset when in use.

Posted: **Wed Jun 30, 2004 9:24 pm**

As Ray said, no such hashed file stage in PX. You can use either Merge, lookup or join to do your lookup needs. Also you can assign nodes depending on your PX server configuration to achieve partition and pipeline parallelism in which each node will do the operation parallely which eventually improves the performance.

Posted: **Thu Feb 24, 2005 2:35 pm**

READ THE MANUAL

acool wrote:Hello everyone,

I am a fresh man on PX. I need to rewrite a Datastage job in PX. It is a simple job, but the volume is hugh. The source only has 3 colums, but 300 million rows. I only have 2 simple lookups. But one lookup hash file has 10 million records.

When I run this job in Datastage, the speed is about 1000k/s, which means I need about 10 hours to complete the job. How fast do you think PX can reach? What kind of strategy do I use in PX to maximize speed and stability?

I am very new to PX, and the only reference I have now is the PDF file that comes with Datastage, so any suggestion will be VERY helpful to me?

Thanks you!

Posted: **Thu Feb 24, 2005 3:30 pm**

Ascential has published a benchmark that achieved 300GB/hour (projected).

Posted: **Fri Feb 25, 2005 3:39 pm**

If u had huge volume better use the merge so the performance will increase.Its better if u can go through the documentation on parallel extender u will know about pipeline and partion paralllism.......

If you can combine pipeline and partition parallelism together I hope the results would be better. And then apply merge state it will be so nice.Join stage is there but not for huge volume...

DSXchange

writing my first PX job

writing my first PX job

Re: writing my first PX job

Re: writing my first PX job