Best ways for updating a table

kirankota79 · Post by **kirankota79** » Mon Jun 23, 2008 7:49 am

Job1 took 1 hr 20 mins to complete. The source and destination are same.
That is the reason i am writing to dataset fearing deadlocks

Thanks

ArndW wrote:How long does Job 1 take? Combining these two jobs and removing the DataSet will certainly speed up processing.

kirankota79 · Post by **kirankota79** » Mon Jun 23, 2008 9:01 am

just to let you know if this matters ...the DS server is at indiana and the database is at southcarolina.

sreddy · Post by **sreddy** » Mon Jun 23, 2008 11:59 am

Kiran

The source and destination are same means (Dataset) right.
Dataset is used temporary storage purpose; you may not keep the data long period.

One more is to improve the performance split your job in to two jobs.
Extract the data and load in to Dataset.
The next job getting data from dataset then loaded in to your Target Database table.
One more is use the Nodes.

As per my knowledge Dead lock may happend if we do two transactions at a time (delete and Update ) for the same target table with millions of records.

This is my Idea.. check in this forum any related information.

kirankota79 wrote:Job1 took 1 hr 20 mins to complete. The source and destination are same.
That is the reason i am writing to dataset fearing deadlocks

Thanks

ArndW wrote:How long does Job 1 take? Combining these two jobs and removing the DataSet will certainly speed up processing.

kirankota79 · Post by **kirankota79** » Mon Jun 23, 2008 12:09 pm

No...The oracle table is the same source and destination. I have selected columns from the table ....right to a dataset using the 4 nodes in the first job.

In the second job i use the dataset and update back to the same table where i selected from.

I am already doing same thing what you explained.

let me know how you reduced the time for update from 14hrs to 45 mins.

That would be helpful for me!

sreddy wrote:Kiran

The source and destination are same means (Dataset) right.
Dataset is used temporary storage purpose; you may not keep the data long period.

One more is to improve the performance split your job in to two jobs.
Extract the data and load in to Dataset.
The next job getting data from dataset then loaded in to your Target Database table.
One more is use the Nodes.

As per my knowledge Dead lock may happend if we do two transactions at a time (delete and Update ) for the same target table with millions of records.

This is my Idea.. check in this forum any related information.

kirankota79 wrote:Job1 took 1 hr 20 mins to complete. The source and destination are same.
That is the reason i am writing to dataset fearing deadlocks

Thanks

kirankota79 · Post by **kirankota79** » Mon Jun 23, 2008 12:32 pm

normally the data is written to 4 nodes...This is through "Auto" partitioning.

Is it helpful to do a hash partition on the output dataset in the first job. And then

Use the dataset in the second job...and in the transformer stage (next to the input dataset) again set the partitioning as hash same as in the first job?

As i know we cannot see any partitioning properties in the input dataset stage..so we need to setup this in the transformer stage.

does this makes any sense?