Best ways for updating a table

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Best ways for updating a table

Post by kirankota79 »

Hi,

I am updating a table that contains 10 million records. I am updating 4 columns with 7 key columns that make unique for each row.

It took 10 hrs to complete 2 million records and if it moves at this pace it takes 50hrs. Can some let me know what is the best approach for it to make it faster?

I am updating an oracle table from a dataset.

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

"7 key columns that make unique for each row" only helps if there is an index over those columns, or at least an index where the leading columns include as many of these as possible.

Is that the case? Otherwise, each update results in a full table scan.
-craig

"You can never have too many knives" -- Logan Nine Fingers
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Post by jasper »

On Oracle I would never update all records of a table. Even if you have 100 columns and only need to update 4 it will be faster to create a new table, load it with the full records(while loading change the 4 columns you need to update offcource.) Once this is done rename the old table(to _archive) or drop it and rename the new table to the old name.
mikegohl
Premium Member
Premium Member
Posts: 97
Joined: Fri Jun 13, 2003 12:50 pm
Location: Chicago
Contact:

Post by mikegohl »

Are you updating every row in the table? You may want to create a new table each time.
Michael Gohl
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Post by kirankota79 »

yes i am updating each row of 4 columns
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Post by kirankota79 »

jasper wrote:On Oracle I would never update all records of a table. Even if you have 100 columns and only need to update 4 it will be faster to create a new table, load it with the full records(while loading change the 4 columns you need to update offcource.) Once this is done rename the old table(to _archive) or drop it and rename the new table to the old name.
Can you please explain more on this?

Thanks
mikegohl
Premium Member
Premium Member
Posts: 97
Joined: Fri Jun 13, 2003 12:50 pm
Location: Chicago
Contact:

Post by mikegohl »

I posted about the same time as Jasper. I agree with Jasper.
Michael Gohl
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Post by jasper »

I don't know your job so I'll make some assumptions.


I suppose you have a second source where you have the 7 key colums+ the 4 columns to be updated. If you select all records from your table, all columns exept the 4 and you join this with the second source, you get the full new record. Insert this into a new table and then rename to make the new table act like the old.

The rename is not something you can do in standard datastage, but do this as an afterjob routine or something (close command also possible.)
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Post by kirankota79 »

jasper wrote:I don't know your job so I'll make some assumptions.


I suppose you have a second source where you have the 7 key colums+ the 4 columns to be updated. If you select all records from your table, all columns exept the 4 and you join this with the second source, you get the full new record. Insert this into a new table and then rename to make the new table act like the old.

The rename is not something you can do in standard datastage, but do this as an afterjob routine or something (close command also possible.)
As per my understanding..i will be doing the following...please correct me if i am wrong



dataset with records excluding 4 columns
\
\
\ --------------->
/
dataset with 4 changed columns........ /

MergeState------>Dataset(full table)------>OracleStage (create newtable)--->change the name of the table/drop the old table
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Take that advice with a grain of salt. It can easier said than done to just 'build a new table and swap it around for the old one' when there are constraints / foreign keys / RI involved. That should have at least been mentioned.

And unless we're talking about a work table, not everyone has the grants they'd need to create a new table or rename existing ones, let alone the outage that could potentially cause.

Food for thought.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Post by kirankota79 »

Hi chulett:

What would be the best idea then if i don't get these grants like create and drop table?

Like does it helps splitting the 10 million records into multiple update statements and use each one separately?

Thanks
Azzuri
Premium Member
Premium Member
Posts: 122
Joined: Tue May 13, 2008 11:42 am

Post by Azzuri »

Try sorting the input by the clustered key before updating.
sreddy
Participant
Posts: 144
Joined: Sun Oct 21, 2007 9:13 am

Re: Best ways for updating a table

Post by sreddy »

Hi kirankota79

Can you mentioned the Database names (source and target)
More explain about your design

Why I am asking is earlier I faced same problem. At that time the table has over 100 million rows. Working on AIX DevBCU platform and DB29.1

I changed my design by using DB2EE / NODE for performance.
My job is finished with in 45 min. (earlier that has taken 14hrs.)


It may helpful all. :idea:


kirankota79 wrote:Hi,

I am updating a table that contains 10 million records. I am updating 4 columns with 7 key columns that make unique for each row.

It took 10 hrs to complete 2 million records and if it moves at this pace it takes 50hrs. Can some let me know what is the best approach for it to make it faster?

I am updating an oracle table from a dataset.

Thanks
SReddy
dwpractices@gmail.com
Analyzing Performance
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Re: Best ways for updating a table

Post by kirankota79 »

Hi Sreddy,

I have two jobs. First job writes the key columns and the columns i need to update (here 7 key cols + 4 cols need to be transformed and updated) to a dataset.
Design: oracle enterprise stage----->transformer----->dataset

Second job is used the dataset from the first job and updates the 4 cols.
Design: dataset----->transformer----->oracle enterprise stage

Our server is 4 node setup and the OS is windows 2003 server. DS is 7.5x2
The job is just finished and to took 60 hrs for updating 10million records.

Please let me know if i need to chang any setting to make it faster?

Thanks

sreddy wrote:Hi kirankota79

Can you mentioned the Database names (source and target)
More explain about your design

Why I am asking is earlier I faced same problem. At that time the table has over 100 million rows. Working on AIX DevBCU platform and DB29.1

I changed my design by using DB2EE / NODE for performance.
My job is finished with in 45 min. (earlier that has taken 14hrs.)


It may helpful all. :idea:
[/quote]
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

How long does Job 1 take? Combining these two jobs and removing the DataSet will certainly speed up processing.
Post Reply