Hi,
DS 8.7 on Windows Server 2008 R2 (Dual Core, 64 GB RAM)
Src and Tgt Database - SQL Server 2008
Currently we have a SP that inserts about 4.5 Million records everyday..It drops target table first, creates and inserts the data from a 3 table join query, then runs 3 different updates on this data (Inner joins)
then recreates the Primary Key on this table and send an email on completion..All this takes about 45 secs
I replicated the same process in Datastage(Verified the results, they are exactly same) but it takes about 9 mins to do this.
Using ODBC Connector stage..I only have inserts and No Updates(the 3 updates are implemented using 3 different join stages as reference data is big and hence not using LookUps) and finally a Transformer to derive certain fields based on join values
as done in the SP
What could be the reasons for such a drastic difference in performance?..
It is getting very difficult to justify the use of Datastage in this case
Thanks in advance
Performance Issue Stored Procedure Vs Datastage
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 536
- Joined: Thu Oct 11, 2007 1:48 am
- Location: Bangalore
Hi,
You can always improve the performance of the job.how many jobs you have designed to complete the entire process?
You can always improve the performance of the job.how many jobs you have designed to complete the entire process?
Thanks
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
I already tried breaking the procesess in multilpe parts..multiple jobs..but no effect as the the very first Select Query (join across 3 tables) that fetches the 4.5 M records takes about 6.5 Mins and everything else about 1.5 mins..
I created a simple job to fetch this data in a Copy stage at it takes 6.5 mins proving what I said above
Any other solution/options
I created a simple job to fetch this data in a Copy stage at it takes 6.5 mins proving what I said above
Any other solution/options
I believe you have an example proving the advice to keep database functions on the database, rather than do them in DataStage. I don't see how you can improve your DS performance to match what happens in the SP.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
Re: Performance Issue Stored Procedure Vs Datastage
The loading approach what Datastage is using for SQL Server is pretty old! Start the load and ask your SQL Server DAB to trace, you will come to know. I faced a lot and then changed the approach. Load the Data into a file and use BCP (SQL called inside the DS) to load, will give you unbeatable performance .
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
-
- Participant
- Posts: 117
- Joined: Wed Feb 06, 2013 9:24 am
- Location: Chennai,TN, India
Imagine what is happening here. An ODBC connection is created. The query is submitted. For array size that you have mentioned, it will get those records from the database. Database server and DS server may be in different place.the the very first Select Query (join across 3 tables) that fetches the 4.5 M records takes about 6.5 Mins
Have a job like this
ODBC -> Copy -> Dataset.
Check the time taken
Thanks,
Prasanna
Prasanna