Hi Everyone ,
I have an issue with loading of one of the table . Here are the details
a) Incoming file needs to be loaded into staging
b) From staging into Warehouse
c) Daily load : 20 million records ( delta )
d) Needs to do type1 updates .
c)Unloading the data from warehouse based on the records present in staging .This unload is needed to identify if a record is update or insert .I am separating inserts and updates .
d) Unload query :
select col1 , col2 , col3 , col4 from warehouse table where
( key1 , key2 , key3 ) in ( select col1 , col2 , col3 from staging table )
e) Have unique index on key1 ,key2 , key3
d) After unloading , the unload dataset will be joined with staging table data and insert file , update file will be produced .
e) Requirement is such a way that we CAN ignore updates .
The Unload part of the above process is taking lot of time ( 45-60 min) . I have tried sparse lookup also but its taking more time . Are there any other options to acheive this results in a better time .
Currentli there are 300 million records in warehouse .Incoming data is 20 million per day .
Thanks in advance .
Performance Improvement Needed
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Matching in Database will be best approach than extracting to DataStage.
Main reason is that
1.) it contains a unique index to prevent a full table scan
2.) avoids unwanted network to pull data into DS server
3.) avoids any sorting prior to join
Did you gather the statistics recently for your warehouse table ?
Did you try including hints in the SQL ?
What is the propotion of data you expect to go in the update stream ? If that is small, you can set the update as a dummy SQL and let that be handled in the target.
Main reason is that
1.) it contains a unique index to prevent a full table scan
2.) avoids unwanted network to pull data into DS server
3.) avoids any sorting prior to join
Did you gather the statistics recently for your warehouse table ?
Did you try including hints in the SQL ?
What is the propotion of data you expect to go in the update stream ? If that is small, you can set the update as a dummy SQL and let that be handled in the target.
Sainath.Srinivasan wrote:Matching in Database will be best approach than extracting to DataStage.
Main reason is that
1.) it contains a unique index to prevent a full table scan
2.) avoids unwanted network to pull data into DS server
3.) avoids any sorting prior to join
Did you gather the statistics recently for your warehouse table ?
Did you try including hints in the SQL ?
What is the propotion of data you expect to go in the update stream ? If that is small, you can set the update as a dummy SQL and let that be handled in the target.
Yes , I have gathered the table stats after every load ( every day )
I have not tried hints.The portion of updates that are going is nearly 7-10% of the incoming records .
Are you suggesting to use upsert and reject the updates in the reject file ?
Could you please be more elaborate ?
You have yourself a plain old ordinary sql query optimization task which really doesn't have much to do with DataStage until it starts returning records, at which time your Array Size setting with take affect. Other than that, it's all about explain plans, indexing, hash joins versus nested loops, sql changes and perhaps hints to make the select as efficient as possible. Work with your DBA if you haven't worked all that much with explain plans before.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers