Page 1 of 2

Performance with Datastage 8

Posted: Thu Aug 06, 2009 11:43 pm
by bollinenik
Hi,
We are using now websphere data stage v8, I could see very low performace with compare to Datastage 7.5.1A and 7.5.x series,
but the difference is earlier we do have grid environment but now with datastage 8 so, will it really causing the probelm.
even for straight load it self , is not same as it was doing, I think it's b.coz of not having grid so, is there any other way to improve performance.

means by incresing nodes or by changing some other option.

any one please show some light on this ....

Posted: Fri Aug 07, 2009 2:59 am
by miwinter
Sorry but that is really hard to understand. It sounds like you aren't comparing the same thing though, from what I can glean from your post. It seems you have done more than migrate from v7.5/7.5.1 to v8 and have actually also changed your environment considerably. As such, I don't see how you can expect to get the same performance as once before? :? It's like swapping a Porsche for a Daewoo and expecting to get the same performance.

Posted: Fri Aug 07, 2009 1:47 pm
by bollinenik
old versions was not in current project, where I used grid.
Current project is websphere V8 and we don't have Grid setup.
performnace is not good, even for staright map, it's taking lot of time
like for 4 millins it's taking 20mins so, is there any other way, I can change some settings, which will improve performance.

Posted: Fri Aug 07, 2009 1:58 pm
by chulett
How many processing nodes are you using with your jobs? :?

Posted: Fri Aug 07, 2009 8:27 pm
by jcthornton
What differences besides DS version are there between your environments?

Hardware? Software? Network? Etc?

Are you running the same job in both environments for comparison? What is the difference? Are you talking 5 minutes vs 20 minutes? or 19 minutes vs 20 minutes?

What's your target and source? Local files? Local databases? Remote databases?

What you can try changing is going to be highly dependent upon what you are doing and where the bottleneck is. If you are talking identical (same) server hardware, going local file to local file with no transform, that is going to be a different set of things to look versus going from and to two different remote database servers.

Posted: Fri Aug 07, 2009 10:19 pm
by chulett
Their posts indicate quite a difference in environments, the biggest being the fact that the 7.x environment was a grid environment while the 8.x one is not grid enabled.

Posted: Sat Aug 08, 2009 12:43 am
by lstsaur
Bollinenik,
Forget about how good is your job's performance in your 7.5.1A grid environment. Your non-grid 8.0 processing power is simply not the same as your 7.5.1A grid. Please provide us more detail about your non-grid 8.0 environment such as CPU speed, size of RAM, disk space, etc. How many rows, 50-100 millions, need to be processed or sorted? Or may be give an example of your job design. Then we go from there to see whether the performance of your job can be improved.

Posted: Mon Aug 10, 2009 2:43 pm
by bollinenik
Yup,
Chulett and Istsaur said was right, I am not trying to compare code between environemnts, current environment is not GRID enabled.

Current environment is, we have enough memory space for V8 which doesn't have grid.
we have 2 node configuration, space dedicated for datastage is around 450gb, so, that is big enough for processing 10 million records.

I don't think job flow tuneing suggestions will help to improve, b.coz it's pretty straight forward.

My question is:
Is there any way, we can change and alter options at server side or configuration side, which will replicate Grid performance.
If I change some setting at server side which will improve node process.
Is there any setting like that at server side.
Jobs are not processing more then 2000 records per sec.
If you over come any performance issues earlier in NON-GRID environment, please try to focus light in that area.

That would be great help.

Posted: Mon Aug 10, 2009 5:21 pm
by ray.wurlod
bollinenik wrote: Is there any way, we can change and alter options at server side or configuration side, which will replicate Grid performance.
No.

Posted: Mon Aug 10, 2009 8:04 pm
by bollinenik
Hi ray,
thanks for your response,
do you have any idea why it's taking lot of time even for staright mapping,
b.coz of not having GRID environment, It should not process <2000/sec right..!
Can you suggest me how to reslve this problem, there is difference between GRID and NON-GRID but,
bc.oz of not having GRID processing <2000 records/sec, is not acceptable.

please help me in solving this problem.

Posted: Mon Aug 10, 2009 11:25 pm
by ray.wurlod
Forget about using rows/sec as a metric of "performance". It's largely meaningless.
Then answer some of the questions about your environment posed by others earlier.
Also, how many nodes were you running on? What was the startup time and run time (reported in the log)? What does the job actually do?

Posted: Wed Aug 12, 2009 12:46 pm
by vinodn
we are using 4-Node configuration file.
main_program: Startup time, 0:29; production run time, 39:47.

This job reads data from sequential file and loads into oracle table, between date and timestamp transfiormations are happening in transformer. and Onelookup checking for referential integrity.

it's working in upsert mode but only inserts no updates at all.

Posted: Wed Aug 12, 2009 4:40 pm
by ray.wurlod
Get the individual stage statistics from the job log to find out where the bottleneck is.

Posted: Wed Aug 12, 2009 7:38 pm
by vmcburney
Version 8 comes with a lot of performance reporting. Switch that on and see what you can find out. Since you are running multiple insert streams into Oracle do some performance checking on that database to make sure you are not getting locks. Go back to your old DataStage job and look at the Oracle stage properties and settings, see if there is any specific Oracle partitioning set.

Posted: Wed Aug 12, 2009 8:53 pm
by Ultramundane
bollinenik wrote:old versions was not in current project, where I used grid.
Current project is websphere V8 and we don't have Grid setup.
performnace is not good, even for staright map, it's taking lot of time
like for 4 millins it's taking 20mins so, is there any other way, I can change some settings, which will improve performance.
4,000,000 / 20 / 60 = 3333 rows/second. I assume that these are all generated insert statements that must be parsed and executed by the database engine. If so, I think that seems okay. Have you tried loading the records using the bulk loader instead of flooding your database with so many insert statements?

Also, since you are reading from a sequential file, datastage may not only not use multiple nodes, but it may also combine the operators into one process. That is, unless you tune these settings yourself.

Thanks.