Best ways for updating a table

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Post by kirankota79 »

Job1 took 1 hr 20 mins to complete. The source and destination are same.
That is the reason i am writing to dataset fearing deadlocks

Thanks

ArndW wrote:How long does Job 1 take? Combining these two jobs and removing the DataSet will certainly speed up processing.
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Post by kirankota79 »

just to let you know if this matters ...the DS server is at indiana and the database is at southcarolina.
sreddy
Participant
Posts: 144
Joined: Sun Oct 21, 2007 9:13 am

Post by sreddy »

Kiran

The source and destination are same means (Dataset) right.
Dataset is used temporary storage purpose; you may not keep the data long period.

One more is to improve the performance split your job in to two jobs.
Extract the data and load in to Dataset.
The next job getting data from dataset then loaded in to your Target Database table.
One more is use the Nodes.

As per my knowledge Dead lock may happend if we do two transactions at a time (delete and Update ) for the same target table with millions of records.

This is my Idea.. check in this forum any related information.



kirankota79 wrote:Job1 took 1 hr 20 mins to complete. The source and destination are same.
That is the reason i am writing to dataset fearing deadlocks

Thanks

ArndW wrote:How long does Job 1 take? Combining these two jobs and removing the DataSet will certainly speed up processing.
SReddy
dwpractices@gmail.com
Analyzing Performance
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Post by kirankota79 »

No...The oracle table is the same source and destination. I have selected columns from the table ....right to a dataset using the 4 nodes in the first job.

In the second job i use the dataset and update back to the same table where i selected from.

I am already doing same thing what you explained.

let me know how you reduced the time for update from 14hrs to 45 mins.

That would be helpful for me!

sreddy wrote:Kiran

The source and destination are same means (Dataset) right.
Dataset is used temporary storage purpose; you may not keep the data long period.

One more is to improve the performance split your job in to two jobs.
Extract the data and load in to Dataset.
The next job getting data from dataset then loaded in to your Target Database table.
One more is use the Nodes.

As per my knowledge Dead lock may happend if we do two transactions at a time (delete and Update ) for the same target table with millions of records.

This is my Idea.. check in this forum any related information.



kirankota79 wrote:Job1 took 1 hr 20 mins to complete. The source and destination are same.
That is the reason i am writing to dataset fearing deadlocks

Thanks
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Post by kirankota79 »

normally the data is written to 4 nodes...This is through "Auto" partitioning.

Is it helpful to do a hash partition on the output dataset in the first job. And then

Use the dataset in the second job...and in the transformer stage (next to the input dataset) again set the partitioning as hash same as in the first job?

As i know we cannot see any partitioning properties in the input dataset stage..so we need to setup this in the transformer stage.


does this makes any sense?
Post Reply