Performance with Datastage 8

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

bollinenik
Participant
Posts: 111
Joined: Thu Jun 01, 2006 5:12 am
Location: Detroit

Performance with Datastage 8

Post by bollinenik »

Hi,
We are using now websphere data stage v8, I could see very low performace with compare to Datastage 7.5.1A and 7.5.x series,
but the difference is earlier we do have grid environment but now with datastage 8 so, will it really causing the probelm.
even for straight load it self , is not same as it was doing, I think it's b.coz of not having grid so, is there any other way to improve performance.

means by incresing nodes or by changing some other option.

any one please show some light on this ....
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

Sorry but that is really hard to understand. It sounds like you aren't comparing the same thing though, from what I can glean from your post. It seems you have done more than migrate from v7.5/7.5.1 to v8 and have actually also changed your environment considerably. As such, I don't see how you can expect to get the same performance as once before? :? It's like swapping a Porsche for a Daewoo and expecting to get the same performance.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
bollinenik
Participant
Posts: 111
Joined: Thu Jun 01, 2006 5:12 am
Location: Detroit

Post by bollinenik »

old versions was not in current project, where I used grid.
Current project is websphere V8 and we don't have Grid setup.
performnace is not good, even for staright map, it's taking lot of time
like for 4 millins it's taking 20mins so, is there any other way, I can change some settings, which will improve performance.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

How many processing nodes are you using with your jobs? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
jcthornton
Premium Member
Premium Member
Posts: 79
Joined: Thu Mar 22, 2007 4:58 pm
Location: USA

Post by jcthornton »

What differences besides DS version are there between your environments?

Hardware? Software? Network? Etc?

Are you running the same job in both environments for comparison? What is the difference? Are you talking 5 minutes vs 20 minutes? or 19 minutes vs 20 minutes?

What's your target and source? Local files? Local databases? Remote databases?

What you can try changing is going to be highly dependent upon what you are doing and where the bottleneck is. If you are talking identical (same) server hardware, going local file to local file with no transform, that is going to be a different set of things to look versus going from and to two different remote database servers.
Jack Thornton
----------------
Spectacular achievement is always preceded by spectacular preparation - Robert H. Schuller
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Their posts indicate quite a difference in environments, the biggest being the fact that the 7.x environment was a grid environment while the 8.x one is not grid enabled.
-craig

"You can never have too many knives" -- Logan Nine Fingers
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Bollinenik,
Forget about how good is your job's performance in your 7.5.1A grid environment. Your non-grid 8.0 processing power is simply not the same as your 7.5.1A grid. Please provide us more detail about your non-grid 8.0 environment such as CPU speed, size of RAM, disk space, etc. How many rows, 50-100 millions, need to be processed or sorted? Or may be give an example of your job design. Then we go from there to see whether the performance of your job can be improved.
bollinenik
Participant
Posts: 111
Joined: Thu Jun 01, 2006 5:12 am
Location: Detroit

Post by bollinenik »

Yup,
Chulett and Istsaur said was right, I am not trying to compare code between environemnts, current environment is not GRID enabled.

Current environment is, we have enough memory space for V8 which doesn't have grid.
we have 2 node configuration, space dedicated for datastage is around 450gb, so, that is big enough for processing 10 million records.

I don't think job flow tuneing suggestions will help to improve, b.coz it's pretty straight forward.

My question is:
Is there any way, we can change and alter options at server side or configuration side, which will replicate Grid performance.
If I change some setting at server side which will improve node process.
Is there any setting like that at server side.
Jobs are not processing more then 2000 records per sec.
If you over come any performance issues earlier in NON-GRID environment, please try to focus light in that area.

That would be great help.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

bollinenik wrote: Is there any way, we can change and alter options at server side or configuration side, which will replicate Grid performance.
No.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bollinenik
Participant
Posts: 111
Joined: Thu Jun 01, 2006 5:12 am
Location: Detroit

Post by bollinenik »

Hi ray,
thanks for your response,
do you have any idea why it's taking lot of time even for staright mapping,
b.coz of not having GRID environment, It should not process <2000/sec right..!
Can you suggest me how to reslve this problem, there is difference between GRID and NON-GRID but,
bc.oz of not having GRID processing <2000 records/sec, is not acceptable.

please help me in solving this problem.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Forget about using rows/sec as a metric of "performance". It's largely meaningless.
Then answer some of the questions about your environment posed by others earlier.
Also, how many nodes were you running on? What was the startup time and run time (reported in the log)? What does the job actually do?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vinodn
Charter Member
Charter Member
Posts: 93
Joined: Tue Dec 13, 2005 11:00 am

Post by vinodn »

we are using 4-Node configuration file.
main_program: Startup time, 0:29; production run time, 39:47.

This job reads data from sequential file and loads into oracle table, between date and timestamp transfiormations are happening in transformer. and Onelookup checking for referential integrity.

it's working in upsert mode but only inserts no updates at all.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Get the individual stage statistics from the job log to find out where the bottleneck is.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Version 8 comes with a lot of performance reporting. Switch that on and see what you can find out. Since you are running multiple insert streams into Oracle do some performance checking on that database to make sure you are not getting locks. Go back to your old DataStage job and look at the Oracle stage properties and settings, see if there is any specific Oracle partitioning set.
Ultramundane
Participant
Posts: 407
Joined: Mon Jun 27, 2005 8:54 am
Location: Walker, Michigan
Contact:

Post by Ultramundane »

bollinenik wrote:old versions was not in current project, where I used grid.
Current project is websphere V8 and we don't have Grid setup.
performnace is not good, even for staright map, it's taking lot of time
like for 4 millins it's taking 20mins so, is there any other way, I can change some settings, which will improve performance.
4,000,000 / 20 / 60 = 3333 rows/second. I assume that these are all generated insert statements that must be parsed and executed by the database engine. If so, I think that seems okay. Have you tried loading the records using the bulk loader instead of flooding your database with so many insert statements?

Also, since you are reading from a sequential file, datastage may not only not use multiple nodes, but it may also combine the operators into one process. That is, unless you tune these settings yourself.

Thanks.
Post Reply