Page 1 of 1

Performance Issue Stored Procedure Vs Datastage

Posted: Tue Apr 09, 2013 11:03 am
by synsog
Hi,

DS 8.7 on Windows Server 2008 R2 (Dual Core, 64 GB RAM)
Src and Tgt Database - SQL Server 2008

Currently we have a SP that inserts about 4.5 Million records everyday..It drops target table first, creates and inserts the data from a 3 table join query, then runs 3 different updates on this data (Inner joins)
then recreates the Primary Key on this table and send an email on completion..All this takes about 45 secs

I replicated the same process in Datastage(Verified the results, they are exactly same) but it takes about 9 mins to do this.

Using ODBC Connector stage..I only have inserts and No Updates(the 3 updates are implemented using 3 different join stages as reference data is big and hence not using LookUps) and finally a Transformer to derive certain fields based on join values
as done in the SP

What could be the reasons for such a drastic difference in performance?..
It is getting very difficult to justify the use of Datastage in this case

Thanks in advance

Posted: Tue Apr 09, 2013 2:11 pm
by prasson_ibm
Hi,
You can always improve the performance of the job.how many jobs you have designed to complete the entire process?

Posted: Tue Apr 09, 2013 2:31 pm
by synsog
I already tried breaking the procesess in multilpe parts..multiple jobs..but no effect as the the very first Select Query (join across 3 tables) that fetches the 4.5 M records takes about 6.5 Mins and everything else about 1.5 mins..

I created a simple job to fetch this data in a Copy stage at it takes 6.5 mins proving what I said above

Any other solution/options

Posted: Tue Apr 09, 2013 2:45 pm
by FranklinE
I believe you have an example proving the advice to keep database functions on the database, rather than do them in DataStage. I don't see how you can improve your DS performance to match what happens in the SP.

Re: Performance Issue Stored Procedure Vs Datastage

Posted: Tue Apr 09, 2013 11:00 pm
by SURA
The loading approach what Datastage is using for SQL Server is pretty old! Start the load and ask your SQL Server DAB to trace, you will come to know. I faced a lot and then changed the approach. Load the Data into a file and use BCP (SQL called inside the DS) to load, will give you unbeatable performance .

Posted: Tue Apr 09, 2013 11:58 pm
by prasannakumarkk
the the very first Select Query (join across 3 tables) that fetches the 4.5 M records takes about 6.5 Mins
Imagine what is happening here. An ODBC connection is created. The query is submitted. For array size that you have mentioned, it will get those records from the database. Database server and DS server may be in different place.

Have a job like this
ODBC -> Copy -> Dataset.
Check the time taken

Posted: Wed Apr 10, 2013 7:21 am
by chulett
To me this is an "apples and oranges" comparison and something that would benefit greatly from PDO usage.