Best ways for updating a table

kirankota79 · Post by **kirankota79** » Fri Jun 20, 2008 7:43 am

Hi,

I am updating a table that contains 10 million records. I am updating 4 columns with 7 key columns that make unique for each row.

It took 10 hrs to complete 2 million records and if it moves at this pace it takes 50hrs. Can some let me know what is the best approach for it to make it faster?

I am updating an oracle table from a dataset.

Thanks

chulett · Post by **chulett** » Fri Jun 20, 2008 7:46 am

"7 key columns that make unique for each row" only helps if there is an index over those columns, or at least an index where the leading columns include as many of these as possible.

Is that the case? Otherwise, each update results in a full table scan.

jasper · Post by **jasper** » Fri Jun 20, 2008 7:55 am

On Oracle I would never update all records of a table. Even if you have 100 columns and only need to update 4 it will be faster to create a new table, load it with the full records(while loading change the 4 columns you need to update offcource.) Once this is done rename the old table(to _archive) or drop it and rename the new table to the old name.

mikegohl · Post by **mikegohl** » Fri Jun 20, 2008 7:56 am

Are you updating every row in the table? You may want to create a new table each time.

kirankota79 · Post by **kirankota79** » Fri Jun 20, 2008 8:02 am

yes i am updating each row of 4 columns

kirankota79 · Post by **kirankota79** » Fri Jun 20, 2008 8:03 am

jasper wrote:On Oracle I would never update all records of a table. Even if you have 100 columns and only need to update 4 it will be faster to create a new table, load it with the full records(while loading change the 4 columns you need to update offcource.) Once this is done rename the old table(to _archive) or drop it and rename the new table to the old name.

Can you please explain more on this?

Thanks

mikegohl · Post by **mikegohl** » Fri Jun 20, 2008 8:04 am

I posted about the same time as Jasper. I agree with Jasper.

jasper · Post by **jasper** » Fri Jun 20, 2008 8:08 am

I don't know your job so I'll make some assumptions.

I suppose you have a second source where you have the 7 key colums+ the 4 columns to be updated. If you select all records from your table, all columns exept the 4 and you join this with the second source, you get the full new record. Insert this into a new table and then rename to make the new table act like the old.

The rename is not something you can do in standard datastage, but do this as an afterjob routine or something (close command also possible.)

kirankota79 · Post by **kirankota79** » Fri Jun 20, 2008 8:25 am

jasper wrote:I don't know your job so I'll make some assumptions.

I suppose you have a second source where you have the 7 key colums+ the 4 columns to be updated. If you select all records from your table, all columns exept the 4 and you join this with the second source, you get the full new record. Insert this into a new table and then rename to make the new table act like the old.

The rename is not something you can do in standard datastage, but do this as an afterjob routine or something (close command also possible.)

As per my understanding..i will be doing the following...please correct me if i am wrong

dataset with records excluding 4 columns
\
\
\ --------------->
/
dataset with 4 changed columns........ /

MergeState------>Dataset(full table)------>OracleStage (create newtable)--->change the name of the table/drop the old table

chulett · Post by **chulett** » Fri Jun 20, 2008 8:36 am

Take that advice with a grain of salt. It can easier said than done to just 'build a new table and swap it around for the old one' when there are constraints / foreign keys / RI involved. That should have at least been mentioned.

And unless we're talking about a work table, not everyone has the grants they'd need to create a new table or rename existing ones, let alone the outage that could potentially cause.

Food for thought.

kirankota79 · Post by **kirankota79** » Fri Jun 20, 2008 8:55 am

Hi chulett:

What would be the best idea then if i don't get these grants like create and drop table?

Like does it helps splitting the 10 million records into multiple update statements and use each one separately?

Thanks

Azzuri · Post by **Azzuri** » Fri Jun 20, 2008 11:30 am

Try sorting the input by the clustered key before updating.

sreddy · Post by **sreddy** » Fri Jun 20, 2008 1:44 pm

Hi kirankota79

Can you mentioned the Database names (source and target)
More explain about your design

Why I am asking is earlier I faced same problem. At that time the table has over 100 million rows. Working on AIX DevBCU platform and DB29.1

I changed my design by using DB2EE / NODE for performance.
My job is finished with in 45 min. (earlier that has taken 14hrs.)

It may helpful all.

kirankota79 wrote:Hi,

I am updating a table that contains 10 million records. I am updating 4 columns with 7 key columns that make unique for each row.

It took 10 hrs to complete 2 million records and if it moves at this pace it takes 50hrs. Can some let me know what is the best approach for it to make it faster?

I am updating an oracle table from a dataset.

Thanks

kirankota79 · Post by **kirankota79** » Mon Jun 23, 2008 7:40 am

Hi Sreddy,

I have two jobs. First job writes the key columns and the columns i need to update (here 7 key cols + 4 cols need to be transformed and updated) to a dataset.
Design: oracle enterprise stage----->transformer----->dataset

Second job is used the dataset from the first job and updates the 4 cols.
Design: dataset----->transformer----->oracle enterprise stage

Our server is 4 node setup and the OS is windows 2003 server. DS is 7.5x2
The job is just finished and to took 60 hrs for updating 10million records.

Please let me know if i need to chang any setting to make it faster?

Thanks

sreddy wrote:Hi kirankota79

Can you mentioned the Database names (source and target)
More explain about your design

Why I am asking is earlier I faced same problem. At that time the table has over 100 million rows. Working on AIX DevBCU platform and DB29.1

I changed my design by using DB2EE / NODE for performance.
My job is finished with in 45 min. (earlier that has taken 14hrs.)

It may helpful all.

[/quote]

ArndW · Post by **ArndW** » Mon Jun 23, 2008 7:44 am

How long does Job 1 take? Combining these two jobs and removing the DataSet will certainly speed up processing.

DSXchange

Best ways for updating a table

Best ways for updating a table

Re: Best ways for updating a table

Re: Best ways for updating a table