Performance problem even after tuning...URGENT

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

deepak_b73
Participant
Posts: 12
Joined: Thu Feb 16, 2006 1:06 am
Location: Bangalore
Contact:

Performance problem even after tuning...URGENT

Post by deepak_b73 »

Hi,

We are facing perfromance probem in datastage.Problem is different here.
We have done all required optimization at job level. After tuning, job is taking hour to complete. Our production setup is as below.

Server : IBM P Series ( model 70)
OS : AIX 5.3
Datastage version: Enterprise edition 7.5.1

Server as 3 processors assigned. With this setu job was taking 1 hour.
I have increased processing capacity of the server by adding more processors and now it has 5 processors. But still job is taking 1 hour to complete. We checked CPU utilization when job was executing. 50% of CPU capacity was free.Still then Datastage was not using available free CPU capacity to increase performance. Why is it not taking free capacity?IS there some setting needs to be done at OS level or Datastage level to enable usage of full CPU capacity? Please help me.

Regards,
Deepak
Deepak Bhat
Bangalore
sjfearnside
Premium Member
Premium Member
Posts: 278
Joined: Wed Oct 03, 2007 8:45 am

Post by sjfearnside »

Here are two other factors to look at:

1. Memory usage

2. I/O through put
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

You posted a server job type in the EE forum. Is it really a server job?

You have to design in most parallelism with server jobs.

Mike
deepak_b73
Participant
Posts: 12
Joined: Thu Feb 16, 2006 1:06 am
Location: Bangalore
Contact:

Post by deepak_b73 »

Hi sjfearinside,

Thanks for your response.We checked memory usage.It is under control.
Wen you say memory usage do you mean DS project folder or data folder?

I don't know how to check I/O throughput. Please let me know hw to check it?

Regards,
Deepak
Deepak Bhat
Bangalore
deepak_b73
Participant
Posts: 12
Joined: Thu Feb 16, 2006 1:06 am
Location: Bangalore
Contact:

Post by deepak_b73 »

Mike,

I have posted it in this forum since datastage version used by us is Enterprise Edition. But we are using server type and created server jobs as earlier we were using Datastage server edition. We are not using parallel job type. That is why I have posted it here.

Regards,
Deepak
Deepak Bhat
Bangalore
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Ok. If you want a server job to use more processors, then you have to design it to use more processors.

Mike
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

deepak_b73 wrote:I have posted it in this forum since datastage version used by us is Enterprise Edition. But we are using server type and created server jobs as earlier we were using Datastage server edition. We are not using parallel job type. That is why I have posted it here.
Well... for the record, there's one forum here for Parallel jobs and one here for Server jobs regardless of the 'Edition' you are using. Granted, renaming them to be more in-line with IBM's branding confuses things, but that was the intent here.

Parallel job questions in the EE/PX forum.
Server job questions in the Server forum.
Sequence / batch / generic questions in the General forum.

Just as an FYI.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Adding more CPUs to a server doesn't magically mean a single job will use more of them, simply that more jobs could run at the same time. And if you want tips on tuning your job design, you'd need to post it first so we have some idea what it is doing.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sjfearnside
Premium Member
Premium Member
Posts: 278
Joined: Wed Oct 03, 2007 8:45 am

Post by sjfearnside »

deepak_b73 wrote:Hi sjfearinside,

Thanks for your response.We checked memory usage.It is under control.
Wen you say memory usage do you mean DS project folder or data folder?

I don't know how to check I/O throughput. Please let me know hw to check it?

Regards,
Deepak
Talk to your systech people that support the platform, such as AIX, Linux, ... They usually have tools at their disposal to monitor the I/O thru put. Have them monitor the I/O, memory and CPU usage when the job is executing to get a reading on the resource utilization. This may provide a clue to where the problem occurs.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use stage tracing (from the Tracing tab of the Job Run Options dialog) to capture statistics about the Transformer stage(s). Since you have provided no information about the job design there's not really much advice we can sensibly give.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
deepak_b73
Participant
Posts: 12
Joined: Thu Feb 16, 2006 1:06 am
Location: Bangalore
Contact:

Post by deepak_b73 »

Hi,

Thanks to all of you for your replies. Job design is pretty simple. It has a sequential file stage. from there data is red to tranformer. It has 3 lookups with each lookup having data not exceeding more than 200 rows.
Then from transformer data is loaded into oracle table using oracle bulk load stage. Source data is 30 million records. Transformer just has trim for all coulmns and one nullto zero function put on numerical column and one simple constrant which filters data where one column value is greater than 0. This job takes one hour. even after increasing procesing capacity, performance is same.

Regards,
Deepak
Deepak Bhat
Bangalore
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use stage tracing (from the Tracing tab of the Job Run Options dialog) to capture statistics about the Transformer stage(s). Since you have provided no information about the job design there's not really much advice we can sensibly give. One solution that may be suggested by the collected statistics is to use more than one Transformer stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use stage tracing (from the Tracing tab of the Job Run Options dialog) to capture statistics about the Transformer stage(s). Since you have provided no information about the job design there's not really much advice we can sensibly give. One solution that may be suggested by the collected statistics is to use more than one Transformer stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
martind
Participant
Posts: 12
Joined: Thu Dec 18, 2003 4:41 am
Location: Sydney

Post by martind »

Can you replicate the problem on the development server?

Could you make a copy of the job in your production environment and change the Oracle output into a sequential file output and see how fast it runs?

Do you have any indexes, constraints, triggers on your target table?
"I drink to make other people interesting"

George Jean Nathan
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

deepak_b73 wrote:Hi,

Thanks to all of you for your replies. Job design is pretty simple. It has a sequential file stage. from there data is red to tranformer. It has 3 lookups with each lookup having data not exceeding more than 200 rows.
Then from transformer data is loaded into oracle table using oracle bulk load stage. Source data is 30 million records. Transformer just has trim for all coulmns and one nullto zero function put on numerical column and one simple constrant which filters data where one column value is greater than 0. This job takes one hour. even after increasing procesing capacity, performance is same.

Regards,
Deepak
Are those lookups going to the database? Or have you downloaded to hashed files? (can't remember if you can control whether Server lookups go to the DB every time or it downloads the contents to internal storage...) Can always try move them to hashed files as a try (assuming you haven't)

Does every row have to perform all three lookups or only certain rows? Throw constraints on the lookups if possible (since you are currently doing 90 million lookups)
Post Reply