ETL Job v/s Stored Procedure

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Raamc
Premium Member
Premium Member
Posts: 87
Joined: Mon Aug 20, 2007 9:08 am

ETL Job v/s Stored Procedure

Post by Raamc »

Hi All,

I have a query regarding the tuning a datastage job.

My requirement is to read 10 millions of data from a table and do a lookup (Left outer join) on some other table which is on same database, then insert the data into some other table.

My ETL job for above requirement is taking more than 10 mins time to process it.


My Query is
As all the tables (Source, Lookup and Target) are on same database, If I write a stored procedure for doing the same join in SQL query and insert into a table through stored procedure, is it increase the performance?

As we are using the Database concepts for loading and join, is it benefit anything in performance?

Can any one clarify?
Thanks,
Raamc
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: ETL Job v/s Stored Procedure

Post by SURA »

If you ask others, then you need to make more clear like whether all the tables are properly indexed? On which columns, did the table used any partitioning etc

If you want to compare etl vs sql, you can write a procedure and test it. I suggest, that would be the better idea.

DS User
Raamc
Premium Member
Premium Member
Posts: 87
Joined: Mon Aug 20, 2007 9:08 am

Post by Raamc »

Thanks Sura for the Suggestion.

I started writing the PROC, before that I just want to know whether any others come across the same situation and their ideas about this?

And regarding the Indexes, Yes the tables are having the indexes on the columns which I am using in joins.
Thanks,
Raamc
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Then it's almost certain that performing the join in the database - whether using SQL from DataStage or using a stored procedure - will be faster than performing the join in DataStage itself. If the SQL from DataStage is exactly the same as the SELECT statement executed by the stored procedure, then the overheads of invoking the stored procedure are likely to impact total retrieval time negatively.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply