Updating Dimension table taking too Long

ksmurthys · Post by **ksmurthys** » Thu Feb 03, 2005 2:15 pm

Hi ALL,

My product dimension is 600000 rows n tried to update that table taking too long( My product no is CHAR field ). Update action is updating existing or insert new rows.
Job design is

ODBC staging table ----> Transformer -----> ODBC dimesion table
Update action is updating existing or insert new rows n table is only six columns.

Any suggestions how to improve the performance ??

kcbland · Post by **kcbland** » Thu Feb 03, 2005 2:53 pm

1. Analyze source system server load and make sure that machine is not overwhelmed right now.
2. Analyze target system server load and make sure that machine is not overwhelmed right now.
3. Extract your data to file to minimize delays imposed on source system because of target system drag. ODBC --> XFM --> SEQ
4. Separate inserts from updates and allow INSERTS to use INSERT only SQL, as well as allow updates to do UPDATE statements only.
5. Transform your data into sequential load files so that you can see the true time it takes to just insert data, and also see how long just pure updates take.

Historically, updating data takes longer than inserting data. Maybe today you have a lot of updates, whereas yesterday you had more inserts. You really need to do fundamental elementary analysis before we can help you. "My job is slow" usually doesn't help us out much. Good luck.

KeithM · Post by **KeithM** » Thu Feb 03, 2005 3:39 pm

Based on your job design it appears that you are not doing any change data detection and are updating all 600,000 rows. If that is true I would question whether it is necessary to update every row every time your job runs. If only a small percentage of the records are changing on an incremental basis, you have a lot of overhead going on that is not necessary. Search the forum for 'CRC32' which will show you posts on creating jobs that determine when a record has changed. Also search for 'SCD' or 'slowly changing dimensions' for example of splitting inserts and updates into separate data flows. This will also help improve performance. I hope this helps.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Feb 03, 2005 4:11 pm

Some methods are:
1.) Use surrogate key values (integer) instead of char
2.) Use direct-call / wired stage instead of ODBC
3.) Use IPC stage (if you have multiple processors)
4.) Use partition in your DB and load data split by partitions
5.) Use options to merge table (such as Oracle Merge command)
6.) Remove columns you do not intend to populate

ksmurthys · Post by **ksmurthys** » Fri Feb 04, 2005 3:10 pm

Sainath.Srinivasan wrote:Some methods are:
1.) Use surrogate key values (integer) instead of char

COuld u pls explain more in detail??

Sainath.Srinivasan wrote: 2.) Use direct-call / wired stage instead of ODBC

My source system is AS/400 which i can connect through ODBC driver becose its located in Some other place.

Sainath.Srinivasan wrote: 3.) Use IPC stage (if you have multiple processors)

We are not using multiple processors

Sainath.Srinivasan wrote: 4.) Use partition in your DB and load data split by partitions

I don't know how to implement partitions in SQL server.Any tips?

Sainath.Srinivasan wrote: 5.) Use options to merge table (such as Oracle Merge command)

No idea about this stage my target is SQLSERVER database

Sainath.Srinivasan wrote: 6.) Remove columns you do not intend to populate

Here i am populating only five columns