Performance Improvement Needed

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Performance Improvement Needed

Post by nagarjuna »

Hi Everyone ,

I have an issue with loading of one of the table . Here are the details

a) Incoming file needs to be loaded into staging

b) From staging into Warehouse

c) Daily load : 20 million records ( delta )

d) Needs to do type1 updates .

c)Unloading the data from warehouse based on the records present in staging .This unload is needed to identify if a record is update or insert .I am separating inserts and updates .

d) Unload query :

select col1 , col2 , col3 , col4 from warehouse table where
( key1 , key2 , key3 ) in ( select col1 , col2 , col3 from staging table )

e) Have unique index on key1 ,key2 , key3

d) After unloading , the unload dataset will be joined with staging table data and insert file , update file will be produced .

e) Requirement is such a way that we CAN ignore updates .

The Unload part of the above process is taking lot of time ( 45-60 min) . I have tried sparse lookup also but its taking more time . Are there any other options to acheive this results in a better time .

Currentli there are 300 million records in warehouse .Incoming data is 20 million per day .

Thanks in advance .
Nag
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Join the two tables directly in the 'unload' query, that giant 'in' list is about the worst way to handle volumes of that size.
-craig

"You can never have too many knives" -- Logan Nine Fingers
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Craig ,

Thanks for the response . i will try that option but I dont have indexes on that table in staging though i have in Warehouse table . Will it not impact performance ?

Thanks
Nag
Nag
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Try, let us know. What database?
-craig

"You can never have too many knives" -- Logan Nine Fingers
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Oracle Database .
Nag
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Matching in Database will be best approach than extracting to DataStage.

Main reason is that
1.) it contains a unique index to prevent a full table scan
2.) avoids unwanted network to pull data into DS server
3.) avoids any sorting prior to join

Did you gather the statistics recently for your warehouse table ?

Did you try including hints in the SQL ?

What is the propotion of data you expect to go in the update stream ? If that is small, you can set the update as a dummy SQL and let that be handled in the target.
ds_teg
Premium Member
Premium Member
Posts: 51
Joined: Tue Aug 11, 2009 6:53 am
Location: Chicago

Post by ds_teg »

Sainath.Srinivasan wrote:Matching in Database will be best approach than extracting to DataStage.

Main reason is that
1.) it contains a unique index to prevent a full table scan
2.) avoids unwanted network to pull data into DS server
3.) avoids any sorting prior to join

Did you gather the statistics recently for your warehouse table ?

Did you try including hints in the SQL ?

What is the propotion of data you expect to go in the update stream ? If that is small, you can set the update as a dummy SQL and let that be handled in the target.

Yes , I have gathered the table stats after every load ( every day )
I have not tried hints.The portion of updates that are going is nearly 7-10% of the incoming records .
Are you suggesting to use upsert and reject the updates in the reject file ?
Could you please be more elaborate ?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You have yourself a plain old ordinary sql query optimization task which really doesn't have much to do with DataStage until it starts returning records, at which time your Array Size setting with take affect. Other than that, it's all about explain plans, indexing, hash joins versus nested loops, sql changes and perhaps hints to make the select as efficient as possible. Work with your DBA if you haven't worked all that much with explain plans before.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply