Poor Performance post new DataStage Installation

tehavele · Post by **tehavele** » Fri Aug 03, 2012 12:39 am

Hi Experts,

We are in a big fix. We have installed the Infosphere datastage v8.1 FP 1.
Datastage is up and fully functional. We have executed aur complete application on this new installation successfully. All is fine except that performance is very bad. We ran and compared same jobs on two environments. It clearly indicated that jobs in this new server run with 1/10th speed compared to old server. Both servers are same in all sense.

Please let me know if there are any perticular settings to make after the Installation to tune the Installation.

Thanks in advance experts.

ray.wurlod · Post by **ray.wurlod** » Fri Aug 03, 2012 1:03 am

I suspect your assertion that the servers are the same in all respects will turn out to be incorrect. First off, how are you measuring "performance"? What actual measures are you using to assert "10 times worse"?

What IS different? For example do both servers have the same NICs, are exactly the same versions of ODBC drivers and database client software installed? How about the I/O channels?

Without understanding what is meant by "performance" it's impossible to offer much more cogent advice than that you need to isolate what is actually different between the two servers. "Nothing" is clearly not the right answer.

ArndW · Post by **ArndW** » Fri Aug 03, 2012 1:10 am

If both systems were identical, they would produce identical results with identical programs & data, but there is a difference so, ipso-facto, they must be different.

Finding and fixing the error isn't always easy. You first need to identify one simple (to test) aspect which is different and then start making that test smaller and smaller until it is small enough to identify a difference between the two machines.

Take a simple job which doesn't use too much CPU but uses a lot of I/O. Is that slower on the new machine? Probably. Does it read and/or write to the files system using datasets or use a database? If a database, is the database on the same machine or remote? If the job uses datasets, is the dataset file directory (as specified in the $APT_CONFIG file) local or on a SAN or on a remotely mounted disk?

The questions above are meant to be an example of a line of questions you need to ask yourself in order to narrow down your problem.

tehavele · Post by **tehavele** » Fri Aug 03, 2012 3:22 am

Hi, Thanks for your quick response...
Answers to ur questions are as below:

1) Engine, Metatadata Db and Websphere installed on the same Physical HPUX machine. The Database in use is on another HPUX server in the same datacentre.

2) I meant that the jobs and the data that we use to test on both the servers is exactly same.
3) Yes the servers are different but the new server has better resources:
i.e. 4 CPUs as compared to 2 CPUs of old server.
24 Gb RAM as compared to 16 Gb RAm on old server.
More Scratch space and file space given.
Inspite it being a better server, performance is lower.
4) I measured Performance in the following manner:

a) Ran a simple job [Seql File --> Dataset]
Old server speed is 25k Rows/Sec, New server speed is 19k Rows/Sec
b) Ran little complex job with lookup stages
Lookup link speed--> Old server 5554 Rows/Sec, New server 641 Rows/Sec

5) Network team analysis says that there is no problem with connection.
Datatransfer rate between datastage and DB server is very good.

6) Need to check very valid points by ray 1.e. "do both servers have the same NICs, are exactly the same versions of ODBC drivers and database client software installed?". Will give these details in a while.

Please let me know if I need to check anything else.

ArndW · Post by **ArndW** » Fri Aug 03, 2012 5:07 am

Example (a) tells me that the new disk target for datasets is slower. Is it a SAN, or the same SAN and the same virtual disk?. Note that this test is only useful for comparison purposes if it runs for at least a minute, longer is better.

Example (b) doesn't necessarily identify any specific bottleneck. It could be the read speed of the sources to the lookup, or could be CPU. Are you $APT_CONFIG_FILE settings the same in terms of number of processing nodes? Are the 2 vs. 4 CPUs real comparisons of physical CPUs of the same type and frequency or are these virtualized CPUs?

tehavele · Post by **tehavele** » Mon Aug 06, 2012 12:46 am

Thanks Arndw, this is a very good information.
a) New disk target for datasets is slower. It should be a SAN and a different one as compared to the faster server. I will confirm this and get back to you soon. I will also run the job for more that a minute this time and confirm. last time i ran it for just 2 seconds.

b) 2 vs 4 CPUs corresponds to physical CPUs. For the 1st run the $APT_CONFIG_FILE settings were same. Later I increased the number of nodes in the new server from 4 to 6 and then 8 nodes. There was a slight improvement in speed but yet very slow compared to the other server.

Attaching a section of the Machinfo output below:

Fast server CPU info:
-----------------------------------
2 Intel(R) Itanium 2 9100 series processors (1.59 GHz, 12 MB)
266 MHz bus, CPU version A1

Memory: 16378 MB (15.99 GB)

Slow server CPU info:
-----------------------------------
4 Intel(R) Itanium(R) Processor 9350s (1.73 GHz, 6 MB)
2.39 GT/s QPI, CPU version E0

Memory: 24570 MB (23.99 GB)

ArndW · Post by **ArndW** » Mon Aug 06, 2012 1:28 am

I'd stick with analyzing your example (b), since that shows a big unexplained difference in speeds.

What are the sources to the lookup in that job? Database connections? If so, do a simple dummy job DB -> peek stage with a large volumn on both machines and compare the throughput.

PaulVL · Post by **PaulVL** » Mon Aug 06, 2012 6:54 am

Where are your logs being stored?

xmeta or univers?

tehavele · Post by **tehavele** » Mon Aug 06, 2012 8:22 am

Hi Paul, Logs are stored in Universe ... RTLogging=1, ORLogging=0

PaulVL · Post by **PaulVL** » Mon Aug 06, 2012 11:07 am

Spin up a performance job that does not touch disk.

RowGen (about 100M recods) --> transform (col+1) -->Peek

Run it in both environments.

That will at least tell you if the environment is slow, or if your file system is slow.

Keep in mind that your logs are still writing to disk while the job runs, but less so since it's a small job.

Once that job runs, replace the peek with a sequential file stage, or dataset (if your datasets are on a different device).

tehavele · Post by **tehavele** » Wed Aug 08, 2012 12:19 am

New observations and additional points:

1) Job 1
[SeqFile  Dataset]
Ran this job with more data for more that 2 minutes as compared to 2 seconds last time. The statistics differed and New TEST server proved to be faster.
2) Job 2
[ColGen  trnsformr  Peak]
This job with no I/O ran faster in new TEST server. Good result. This job ran for around 5 minutes.
3) Jobs with Lookup stage are very slow on the new server.
4) I suspect that the performance is going down for the jobs that have mainly the Lookup stage.

See if you able to make something out of this.

ArndW · Post by **ArndW** » Wed Aug 08, 2012 12:41 am

I would recommend that you set $APT_DUMP_SCORE on your job and you will most likely see that DataStage has inserted a repartitioning operator and/or a sort somewhere in your job and that this "hidden" stage is causing a slowdown.
In the lookup stage how many rows and approximately how many Mb is your reference data and where is it coming from (text file? Dataset? Database?) and check to make sure that you are not doing a sparse lookup in one job and a normal one in the other.

tehavele · Post by **tehavele** » Wed Aug 08, 2012 3:57 am

Hi Arndw,

1) As I said earlier I am importing the same job on both the environments are running them. Hence there in absolutely not design change.
Is it possible that the DataStage has inserted a repartitioning operator and/or a sort somewhere in our job without us doing something?

2) Lookups on both the jobs are normal lookups. And the reference is comming from a database. The Input links are fast but the reference lookups are slow. The reference data is quite OK.

ArndW · Post by **ArndW** » Wed Aug 08, 2012 5:25 am

1) Yes, it might be possible that one installation does and another doesn't insert sorts / repartitions but that would depend on your configuration settings. If you didn't explicitly change $APT_NO_PART_INSERTION or $APT_NO_SORT_INSERTION then both installations should execute the same score for jobs at runtime.

2) Make the reference lookup a dataset instead of your database in both environments and compare the speeds, that will rule out the DB from the equation and you shouldn't get very significant time differences - on the order of seconds for a job running more than 2 minutes.

vmcburney · Post by **vmcburney** » Thu Aug 16, 2012 11:00 pm

Sequential files, lookup stages and dataset stages all have the same performance limitation of disk I/O. A large lookup will write data to lookup filesets across each partition - as you add partitions it may get no faster as it still needs to write all lookup data out to each partition. You will see this as the bottleneck if you turn on resource monitoring on the server.

You may need to look at your scratch and disk space in terms of local versus network versus solid state disk storage to find the fastest I/O. RAID 0 is preferred as this is temp date where data loss is irrelevant.

Row generator through transformer, filter to copy stage is raw processing performance. Database to database is another good test as it takes disk i/o out of the equation.

Not sure why you are still on 8.1 - this is way worse than 8.7.