Server Recommendations

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
wdudek
Participant
Posts: 66
Joined: Mon Dec 08, 2003 10:44 am

Server Recommendations

Post by wdudek »

I'm posting this to get a general feel for how adequate/inadequate the server is that we are running Datastage on, without going into the detail of monitoring io/cpu/etc on the server. We have a collection of about 136 jobs that runs nightly and extracts over 30 gb of data from a Unidata database and inserts this into an Oracle database. The data is normalized in hash files, but besides this there is very little being done to it in Datastage. The jobs are grouped into about 30 Sequencers, which run simultaneously. The individual jobs within the sequencers are run ne by one as the prior job completes. The server that we are using is a Compaq running Windows 2000. It has dual 1.2 ghz cpu's and 4 gigs of RAM. The network connection that it is using is 100 mb. Our current run time for this is 10 hrs 30 min, and we are working to lower this. I'm aware that there are probably things we can do within the jobs to help out, but in this post I am only interested on what others feel about our hardware setup. Thanks in advance for any thoughts on this subject.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I would make a Ruth Chris steak dinner bet, without seeing your job designs, that:

If you have (2) jobs in a (2) cpu environment, and each job is 100% isolated to the server environment (reads/writes local sequential/hash files), then your cpu utilization should be 100% for each cpu. (1 single-threaded job, no external considerations, should consume the cpu).

So, running (4) of these jobs in parallel means each job theoretically will use 50% of a cpu (no allowance for OS time or other applications.

Now, when you mix external considerations (database i/o, network, other apps), then you take away from the amount of cpu utilization because the jobs are waiting some of the time. This masks how much resources the job is using, so you actually can run a higher job count.

That being said, unless you're watching Performance Monitor on the server, you cannot accurately gauge the adequacy of your server. My guess is that you have a mixed job design, muddying the waters with source database (Unidata) activity, transformation, and loading (Oracle) and cannot accurately determine bottlenecks. Breaking your processes down into distinct job types (Extracts, transforms, and loads) will isolate and identify bottlenecks.

If your transforms are all server isolated activities (reads/writes local sequential/hash files), then your cpus are slammed (steak dinner bet) for that duration. From my experience, I would rather have more cpus than fewer faster ones (search this forum, we've discussed this before).
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Re: Server Recommendations

Post by Teej »

wdudek wrote:The jobs are grouped into about 30 Sequencers, which run simultaneously.
Are you saying that you are running approximately 30 separate jobs at any time during this processing phase?

If so, that's the problem there. Using Server, you should limit yourself to a very low number of jobs at a time per CPU. Maybe even 1 if it's CPU intensive. Maybe less if you have a very slow hard drive system.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
wdudek
Participant
Posts: 66
Joined: Mon Dec 08, 2003 10:44 am

Re: Server Recommendations

Post by wdudek »

Teej wrote: Are you saying that you are running approximately 30 separate jobs at any time during this processing phase?

If so, that's the problem there. Using Server, you should limit yourself to a very low number of jobs at a time per CPU. Maybe even 1 if it's CPU intensive. Maybe less if you have a very slow hard drive system.

-T.J.
Yes, when the processing starts, there are in the area of thirty jobs running at once. After abot 3 hours, most of the jobs have completed. At this point we are left with jobs that focus on 4 of the larger Unidata files, which are normalized into about 10 oracle tables, this portion of the process is running for about 7 hrs, and equates to 1 to 2 jobs per processor.

Thanks for the info, I have some good starting points now, and of course any more suggestions or experiences will be appreciated.
Post Reply