writing my first PX job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
acool
Participant
Posts: 29
Joined: Tue Feb 17, 2004 4:31 pm

writing my first PX job

Post by acool »

Hello everyone,

I am a fresh man on PX. I need to rewrite a Datastage job in PX. It is a simple job, but the volume is hugh. The source only has 3 colums, but 300 million rows. I only have 2 simple lookups. But one lookup hash file has 10 million records.

When I run this job in Datastage, the speed is about 1000k/s, which means I need about 10 hours to complete the job. How fast do you think PX can reach? What kind of strategy do I use in PX to maximize speed and stability?

I am very new to PX, and the only reference I have now is the PDF file that comes with Datastage, so any suggestion will be VERY helpful to me?

Thanks you!
richdhan
Premium Member
Premium Member
Posts: 364
Joined: Thu Feb 12, 2004 12:24 am

Post by richdhan »

Hi,

First read the parallel job developer's guide. The second chapter has information on pipeline parallelism and partition parallelism.

Read the Manager's guide to setting up the configuration file which allows you to configure multinodes. Configuring multinodes helps you to achieve pipeline parallelism.

Setting the different partitioning for diffferent stages helps you to achieve partition parallelism.

If you can combine pipeline and partition parallelism together I hope the results would be better.

HTH
--Rich
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The biggest difference for you will be getting your head around the fact that hashed files play no part in reference lookups in parallel jobs.
Instead, you have a choice of three stage types (Lookup, Join and Merge) depending on exactly what you need to do; each refers to an in-memory dataset when in use.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Post by vigneshra »

As Ray said, no such hashed file stage in PX. You can use either Merge, lookup or join to do your lookup needs. Also you can assign nodes depending on your PX server configuration to achieve partition and pipeline parallelism in which each node will do the operation parallely which eventually improves the performance.
rakeshcv
Participant
Posts: 8
Joined: Mon Apr 12, 2004 9:27 pm
Location: delaware
Contact:

Re: writing my first PX job

Post by rakeshcv »

READ THE MANUAL
acool wrote:Hello everyone,

I am a fresh man on PX. I need to rewrite a Datastage job in PX. It is a simple job, but the volume is hugh. The source only has 3 colums, but 300 million rows. I only have 2 simple lookups. But one lookup hash file has 10 million records.

When I run this job in Datastage, the speed is about 1000k/s, which means I need about 10 hours to complete the job. How fast do you think PX can reach? What kind of strategy do I use in PX to maximize speed and stability?

I am very new to PX, and the only reference I have now is the PDF file that comes with Datastage, so any suggestion will be VERY helpful to me?

Thanks you!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Ascential has published a benchmark that achieved 300GB/hour (projected).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chalasaniamith
Participant
Posts: 36
Joined: Wed Feb 16, 2005 5:20 pm
Location: IL

Re: writing my first PX job

Post by chalasaniamith »

If u had huge volume better use the merge so the performance will increase.Its better if u can go through the documentation on parallel extender u will know about pipeline and partion paralllism.......

If you can combine pipeline and partition parallelism together I hope the results would be better. And then apply merge state it will be so nice.Join stage is there but not for huge volume...
Post Reply