How effective DataStage EE on 4 CPU system?

sergd · Post by **sergd** » Fri Nov 28, 2003 1:16 am

Is it reasonable to use the DS Enterprise Edition on 4 CPU system or the Server Edition is enough? How many CPU's are recommended for effective usage of the Enterprise Edition?
Thank you in advance.

Sergey

Teej · Post by **Teej** » Fri Nov 28, 2003 6:56 am

sergd wrote: Is it reasonable to use the DS Enterprise Edition on 4 CPU system or the Server Edition is enough? How many CPU's are recommended for effective usage of the Enterprise Edition?
Thank you in advance.

4 CPU is pretty good, and would provide a noticable boost in performance compared to the Server processing. OF course, it all boils down to tweaking the job to ensure that they are reasonably demanding from the computer, without being too overtly demanding.

It also help to have a large amount of RAM, speedy (and reliable) drives, and solid network connection to your database.

If you expect to run the database on the same server... *shudder* We made that mistake. Once.

-T.J.

Paul Preston · Post by **Paul Preston** » Mon Dec 01, 2003 2:57 am

To get the most from a 4 CPU box it is worth switching on inter process row buffering from the Performance tab of the Job Properties. This can provide dramatic performance improvements depending on the job design.

If you use this option be careful to check that your job still works as expected; global variables set and accessed by different stages may behave differently with inter process buffering.

bigpoppa · Post by **bigpoppa** » Mon Dec 01, 2003 3:55 pm

Sergey,

If you're processing a ton of data, then the more CPUs the better. You will see a marked improvement in performance going from one CPU to four, but you may not see the total improvement that you would like.

Sizing your PX job to the best performance is largely a matter of manually increasing the # CPUs and system resources and fine-tuning your PX job flow.

- BP

kcbland · Post by **kcbland** » Mon Dec 01, 2003 7:38 pm

Here's my opinion:

The ease of use in Server jobs would motivate me more on a 4-cpu whatever. Parallel jobs are meant for high volume processing. I would use Parallel jobs only where necessary. Since PX is meant for high volume processing, I would guess that you do not have high volumes if your server only has 4 cpus.

Just so you have a frame of reference, most of my terabyte+ warehouses were on a V90 series 18 cpu server, E10K with 16 cpus, E6500 16 cpu server, and another E6500 16 cpu server. Most of these machines had 30-40 gb of RAM. I wish we had PX on those projects.

Teej · Post by **Teej** » Tue Dec 02, 2003 7:11 am

We are actually performing pretty well on a 4 CPU box with terrabyte+ programs. Obviously, not as quickly as the other 8 CPU boxes we have internally.

In an internal test, a simple job like a transfer from Oracle to flat file performs approximately 3-5 times faster in a PX job than a Server job on this 4 CPU box. For millions of rows worth of data, that is a significant time saving.

-T.J.

ray.wurlod · Post by **ray.wurlod** » Tue Dec 02, 2003 10:33 am

Would that difference still occur if you started four instances of a multi-instance server job, with appropriate parameters to partition the data?
Just curious.

Teej · Post by **Teej** » Tue Dec 02, 2003 11:23 am

ray.wurlod wrote:Would that difference still occur if you started four instances of a multi-instance server job, with appropriate parameters to partition the data?
Just curious.

Not really. Of course, one will have to know in advance the correct partitioning ratio in order not to skew the data to one side or another. Is there a way to automatically obtain this information other than analyzing the table in Oracle?

Of course, one have to factor in development time, which server in one partition have in spades. Factoring above steps in would, in my biased presumptions, require more coding for Server than its equivalence in Parallel.

Now how the heck would we deal with four separate files if the client only wanted one? A small script that adds time to the process?

There are a lot of factors that have to go into this. The example I provided was a quick whipped together using bulk reading and processing on server side, and just an addition of partition table option to the parallel side, and some tweaking to the data to conform to the required fixed length format. A job that used to take 3 hours (because I wanted to rely on Server's fixed column formatting gem) now only takes 20-30 minutes to do (after a hour of properly tweaking the parallel column formatting which is still a pain).

-T.J.

kcbland · Post by **kcbland** » Tue Dec 02, 2003 12:23 pm

Round robin partitioning of a sequential text source file can be achieved quite simply. If you have four job instance clones, each one is assigned a number 1-4 in a job parameter (ex. PartitionNumber) and the total number of instance clones (ex. PartitionCount).

If each job reads that sequential source file and applies a constraint of:

Code: Select all

MOD(@INROWNUM, PartitionNumber) = PartitionCount - 1

then each instance takes 1/PartitionCount rows. If the output is into the same hash file, the hash file will contain the full complement of data across all instances. (Coool) If the output is a sequential text file, then a followup job can issue a concatenate statement to recombine the data.

Before PX we still loaded terabyte warehouses quite easily, in fact, to Ray's point, we need to make sure that apples to apples comparisons are made. If you are running single-threaded Server jobs that only use 1 cpu, you cannot compare it to a Parallel multi-processing job. You need to use instantiation to the same degree of parallelism.

ray.wurlod · Post by **ray.wurlod** » Tue Dec 02, 2003 6:11 pm

That kind of scheme - which we were perforce required to implement prior to the advent of PX - works particularly well because the four readers tend to "piggy back" - the one that's actually reading at any one time has the effect of warming the cache for the others. An alternate approach, if the row count was known in advance, would be to have N readers processing 1/N of the rows each; all but the ones processing the first set of rows would go screaming along till they got to their share of the file, warming the cache for the first guy, who could concentrate on transformation. When he was done he'd run at high speed and end up warming the cache for some of the end guys.

Teej · Post by **Teej** » Wed Dec 03, 2003 11:14 am

That's why I never considered that.

I never share my (bed) warmer.

-T.J.

DSXchange

How effective DataStage EE on 4 CPU system?

How effective DataStage EE on 4 CPU system?

Re: How effective DataStage EE on 4 CPU system?