Hello everyone,
I am a fresh man on PX. I need to rewrite a Datastage job in PX. It is a simple job, but the volume is hugh. The source only has 3 colums, but 300 million rows. I only have 2 simple lookups. But one lookup hash file has 10 million records.
When I run this job in Datastage, the speed is about 1000k/s, which means I need about 10 hours to complete the job. How fast do you think PX can reach? What kind of strategy do I use in PX to maximize speed and stability?
I am very new to PX, and the only reference I have now is the PDF file that comes with Datastage, so any suggestion will be VERY helpful to me?
Thanks you!
writing my first PX job
Moderators: chulett, rschirm, roy
Hi,
First read the parallel job developer's guide. The second chapter has information on pipeline parallelism and partition parallelism.
Read the Manager's guide to setting up the configuration file which allows you to configure multinodes. Configuring multinodes helps you to achieve pipeline parallelism.
Setting the different partitioning for diffferent stages helps you to achieve partition parallelism.
If you can combine pipeline and partition parallelism together I hope the results would be better.
HTH
--Rich
First read the parallel job developer's guide. The second chapter has information on pipeline parallelism and partition parallelism.
Read the Manager's guide to setting up the configuration file which allows you to configure multinodes. Configuring multinodes helps you to achieve pipeline parallelism.
Setting the different partitioning for diffferent stages helps you to achieve partition parallelism.
If you can combine pipeline and partition parallelism together I hope the results would be better.
HTH
--Rich
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The biggest difference for you will be getting your head around the fact that hashed files play no part in reference lookups in parallel jobs.
Instead, you have a choice of three stage types (Lookup, Join and Merge) depending on exactly what you need to do; each refers to an in-memory dataset when in use.
Instead, you have a choice of three stage types (Lookup, Join and Merge) depending on exactly what you need to do; each refers to an in-memory dataset when in use.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
As Ray said, no such hashed file stage in PX. You can use either Merge, lookup or join to do your lookup needs. Also you can assign nodes depending on your PX server configuration to achieve partition and pipeline parallelism in which each node will do the operation parallely which eventually improves the performance.
Re: writing my first PX job
READ THE MANUAL
acool wrote:Hello everyone,
I am a fresh man on PX. I need to rewrite a Datastage job in PX. It is a simple job, but the volume is hugh. The source only has 3 colums, but 300 million rows. I only have 2 simple lookups. But one lookup hash file has 10 million records.
When I run this job in Datastage, the speed is about 1000k/s, which means I need about 10 hours to complete the job. How fast do you think PX can reach? What kind of strategy do I use in PX to maximize speed and stability?
I am very new to PX, and the only reference I have now is the PDF file that comes with Datastage, so any suggestion will be VERY helpful to me?
Thanks you!
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Ascential has published a benchmark that achieved 300GB/hour (projected).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 36
- Joined: Wed Feb 16, 2005 5:20 pm
- Location: IL
Re: writing my first PX job
If u had huge volume better use the merge so the performance will increase.Its better if u can go through the documentation on parallel extender u will know about pipeline and partion paralllism.......
If you can combine pipeline and partition parallelism together I hope the results would be better. And then apply merge state it will be so nice.Join stage is there but not for huge volume...
If you can combine pipeline and partition parallelism together I hope the results would be better. And then apply merge state it will be so nice.Join stage is there but not for huge volume...