How effective DataStage EE on 4 CPU system?
Moderators: chulett, rschirm, roy
How effective DataStage EE on 4 CPU system?
Is it reasonable to use the DS Enterprise Edition on 4 CPU system or the Server Edition is enough? How many CPU's are recommended for effective usage of the Enterprise Edition?
Thank you in advance.
Sergey
Thank you in advance.
Sergey
Re: How effective DataStage EE on 4 CPU system?
4 CPU is pretty good, and would provide a noticable boost in performance compared to the Server processing. OF course, it all boils down to tweaking the job to ensure that they are reasonably demanding from the computer, without being too overtly demanding.sergd wrote: Is it reasonable to use the DS Enterprise Edition on 4 CPU system or the Server Edition is enough? How many CPU's are recommended for effective usage of the Enterprise Edition?
Thank you in advance.
It also help to have a large amount of RAM, speedy (and reliable) drives, and solid network connection to your database.
If you expect to run the database on the same server... *shudder* We made that mistake. Once.
-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
-
- Participant
- Posts: 24
- Joined: Wed Apr 02, 2003 7:09 am
- Location: United Kingdom
To get the most from a 4 CPU box it is worth switching on inter process row buffering from the Performance tab of the Job Properties. This can provide dramatic performance improvements depending on the job design.
If you use this option be careful to check that your job still works as expected; global variables set and accessed by different stages may behave differently with inter process buffering.
If you use this option be careful to check that your job still works as expected; global variables set and accessed by different stages may behave differently with inter process buffering.
Sergey,
If you're processing a ton of data, then the more CPUs the better. You will see a marked improvement in performance going from one CPU to four, but you may not see the total improvement that you would like.
Sizing your PX job to the best performance is largely a matter of manually increasing the # CPUs and system resources and fine-tuning your PX job flow.
- BP
If you're processing a ton of data, then the more CPUs the better. You will see a marked improvement in performance going from one CPU to four, but you may not see the total improvement that you would like.
Sizing your PX job to the best performance is largely a matter of manually increasing the # CPUs and system resources and fine-tuning your PX job flow.
- BP
Here's my opinion:
The ease of use in Server jobs would motivate me more on a 4-cpu whatever. Parallel jobs are meant for high volume processing. I would use Parallel jobs only where necessary. Since PX is meant for high volume processing, I would guess that you do not have high volumes if your server only has 4 cpus.
Just so you have a frame of reference, most of my terabyte+ warehouses were on a V90 series 18 cpu server, E10K with 16 cpus, E6500 16 cpu server, and another E6500 16 cpu server. Most of these machines had 30-40 gb of RAM. I wish we had PX on those projects.
The ease of use in Server jobs would motivate me more on a 4-cpu whatever. Parallel jobs are meant for high volume processing. I would use Parallel jobs only where necessary. Since PX is meant for high volume processing, I would guess that you do not have high volumes if your server only has 4 cpus.
Just so you have a frame of reference, most of my terabyte+ warehouses were on a V90 series 18 cpu server, E10K with 16 cpus, E6500 16 cpu server, and another E6500 16 cpu server. Most of these machines had 30-40 gb of RAM. I wish we had PX on those projects.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
We are actually performing pretty well on a 4 CPU box with terrabyte+ programs. Obviously, not as quickly as the other 8 CPU boxes we have internally.
In an internal test, a simple job like a transfer from Oracle to flat file performs approximately 3-5 times faster in a PX job than a Server job on this 4 CPU box. For millions of rows worth of data, that is a significant time saving.
-T.J.
In an internal test, a simple job like a transfer from Oracle to flat file performs approximately 3-5 times faster in a PX job than a Server job on this 4 CPU box. For millions of rows worth of data, that is a significant time saving.
-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Would that difference still occur if you started four instances of a multi-instance server job, with appropriate parameters to partition the data?
Just curious.
Just curious.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Not really. Of course, one will have to know in advance the correct partitioning ratio in order not to skew the data to one side or another. Is there a way to automatically obtain this information other than analyzing the table in Oracle?ray.wurlod wrote:Would that difference still occur if you started four instances of a multi-instance server job, with appropriate parameters to partition the data?
Just curious.
Of course, one have to factor in development time, which server in one partition have in spades. Factoring above steps in would, in my biased presumptions, require more coding for Server than its equivalence in Parallel.
Now how the heck would we deal with four separate files if the client only wanted one? A small script that adds time to the process?
There are a lot of factors that have to go into this. The example I provided was a quick whipped together using bulk reading and processing on server side, and just an addition of partition table option to the parallel side, and some tweaking to the data to conform to the required fixed length format. A job that used to take 3 hours (because I wanted to rely on Server's fixed column formatting gem) now only takes 20-30 minutes to do (after a hour of properly tweaking the parallel column formatting which is still a pain).
-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
Round robin partitioning of a sequential text source file can be achieved quite simply. If you have four job instance clones, each one is assigned a number 1-4 in a job parameter (ex. PartitionNumber) and the total number of instance clones (ex. PartitionCount).
If each job reads that sequential source file and applies a constraint of:
then each instance takes 1/PartitionCount rows. If the output is into the same hash file, the hash file will contain the full complement of data across all instances. (Coool) If the output is a sequential text file, then a followup job can issue a concatenate statement to recombine the data.
Before PX we still loaded terabyte warehouses quite easily, in fact, to Ray's point, we need to make sure that apples to apples comparisons are made. If you are running single-threaded Server jobs that only use 1 cpu, you cannot compare it to a Parallel multi-processing job. You need to use instantiation to the same degree of parallelism.
If each job reads that sequential source file and applies a constraint of:
Code: Select all
MOD(@INROWNUM, PartitionNumber) = PartitionCount - 1
Before PX we still loaded terabyte warehouses quite easily, in fact, to Ray's point, we need to make sure that apples to apples comparisons are made. If you are running single-threaded Server jobs that only use 1 cpu, you cannot compare it to a Parallel multi-processing job. You need to use instantiation to the same degree of parallelism.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
That kind of scheme - which we were perforce required to implement prior to the advent of PX - works particularly well because the four readers tend to "piggy back" - the one that's actually reading at any one time has the effect of warming the cache for the others. An alternate approach, if the row count was known in advance, would be to have N readers processing 1/N of the rows each; all but the ones processing the first set of rows would go screaming along till they got to their share of the file, warming the cache for the first guy, who could concentrate on transformation. When he was done he'd run at high speed and end up warming the cache for some of the end guys.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.