User defined sql - performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
balajimadhav
Participant
Posts: 15
Joined: Thu Jul 06, 2006 8:59 am

User defined sql - performance

Post by balajimadhav »

Need a help to understand the performance of user defined queries against the job design for the same logic. I have two oracle tables one with 400K records and other with 200K records and i join these two tables on a key (hash partitioned) and the resulting records out of join is moved to a oracle connector for deleting those records.The job executes for 30 mins for deleting 200K records (tried various combination of array size and transaction) to achieve this 30 mins as the optimum timing. Also the same join and delete logic has been implemented using a user defined query and the deletions happens in few seconds.As i understand that using a user defined query adds overhead to Database but could see a huge difference between these two approaches on this specific scenario.Missed to mention that the columns in where clause in delete has been indexed at table level and also gather stats is part of the execution process. Happy to know your views/suggestions on the above mentioned scenario and when should the user defined queries are better than ETL job design (using DS stages). I have searched topics in the forum but couldnt get the details and hence posted a new one
Balaji.
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

Datastage is going to have to do a lot more work than a well configured database to join data. Data transfer (twice as much, at least, as it needs the left and right side not the results), hashing or sorting type things, and more. The database is set up alreaady to join data efficiently .. indexes and tuning tricks make it very good at this work. Do it on the database... there is no way to make datastage outperform the database for tasks like this.

You use the DS join stage when you need to join across databases... federated type things or even file to database or handcrafted data (result of datastage processing) to live data or handcrafted to handcrafted ... that is when you need a DS join.
Post Reply