Performance Improvement Needed
Posted: Sun Jun 13, 2010 8:36 pm
Hi Everyone ,
I have an issue with loading of one of the table . Here are the details
a) Incoming file needs to be loaded into staging
b) From staging into Warehouse
c) Daily load : 20 million records ( delta )
d) Needs to do type1 updates .
c)Unloading the data from warehouse based on the records present in staging .This unload is needed to identify if a record is update or insert .I am separating inserts and updates .
d) Unload query :
select col1 , col2 , col3 , col4 from warehouse table where
( key1 , key2 , key3 ) in ( select col1 , col2 , col3 from staging table )
e) Have unique index on key1 ,key2 , key3
d) After unloading , the unload dataset will be joined with staging table data and insert file , update file will be produced .
e) Requirement is such a way that we CAN ignore updates .
The Unload part of the above process is taking lot of time ( 45-60 min) . I have tried sparse lookup also but its taking more time . Are there any other options to acheive this results in a better time .
Currentli there are 300 million records in warehouse .Incoming data is 20 million per day .
Thanks in advance .
I have an issue with loading of one of the table . Here are the details
a) Incoming file needs to be loaded into staging
b) From staging into Warehouse
c) Daily load : 20 million records ( delta )
d) Needs to do type1 updates .
c)Unloading the data from warehouse based on the records present in staging .This unload is needed to identify if a record is update or insert .I am separating inserts and updates .
d) Unload query :
select col1 , col2 , col3 , col4 from warehouse table where
( key1 , key2 , key3 ) in ( select col1 , col2 , col3 from staging table )
e) Have unique index on key1 ,key2 , key3
d) After unloading , the unload dataset will be joined with staging table data and insert file , update file will be produced .
e) Requirement is such a way that we CAN ignore updates .
The Unload part of the above process is taking lot of time ( 45-60 min) . I have tried sparse lookup also but its taking more time . Are there any other options to acheive this results in a better time .
Currentli there are 300 million records in warehouse .Incoming data is 20 million per day .
Thanks in advance .