ORAOCI8 stage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
luca
Participant
Posts: 81
Joined: Wed May 14, 2003 11:34 pm

ORAOCI8 stage

Post by luca »

Hi !
I have one job with an ORAOCI8 stage with 2 input links, updating the same table (Update action : Update existing rows or insert new rows).
I thought it could be fine to control the execution for this update action in order to sequence this 2 update actions (one after the other would provide better performance than both at the same time).
I don't want to do it using a sequencer because my job is a simple extract/load job, without sequential file stage.
What would be a nice way to do this in a single job ?
Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The usual solution to this is to pre-load a hashed file with the primary key values from the target table, then perform a reference lookup against the hashed file.
If the lookup succeeds, then the row exists and you Update.
If the lookup fails, then the row does not exist and you Insert.
And, yes, it's substantially faster.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
luca
Participant
Posts: 81
Joined: Wed May 14, 2003 11:34 pm

Post by luca »

Hi Ray !
Yes I know that what you suggest is the most performing solution. But in this job, we are not loading a lot of records that's why we thought of simply using this update action (Update existing rows or insert new rows).
My question was : is it possible to sequence the execution of this 2 links ? This would make sure that we have no locks on the table reduicing the insert/update performance.
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

Luca,

you can for sure use the Update action "Update existing rows or insert new rows" (Beware of the opposite!!) and your job will work.

If you don't have anything in the file or in a lookup table which tells you what action to proceed use the above action.

Wolfgang
luca
Participant
Posts: 81
Joined: Wed May 14, 2003 11:34 pm

Post by luca »

I forgot to precise that in this job, 90% of the records will have to be updated and 10% only inserted. In that case, using a hash file is probably less efficient than simply doing a "Update existing rows or insert new rows" in a OCI stage.
What do you think ?
Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Depending on how big the Oracle table is and where it is located, I'd guess that the hashed file method is generally still going to be more efficient.
The load of the hashed file will be based on SELECT pk FROM table which should be very speedy (it should get them from the PK index). It can be done in the same job as the rest of the processing because a passive stage type cannot open its output links until its input links are closed.
Why not set up both scenarios, and report your results?
luca
Participant
Posts: 81
Joined: Wed May 14, 2003 11:34 pm

Post by luca »

This job doesn't process a huge amount of records (50000). Not the best job to do this performance scenario.
But in the near future, we are going to optimize our jobs and we'll do this kind of performance tests for the jobs processing a lot of data. I'll let you know the results then.
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

luca,

for the optimal version of any job only rules of thumb could be given. You are right, that you have to take into account the time of filling the Hashfile. Also with the Hashfile you have several posibilities. One is to fill it completly within the Job an other is to keep it permanent on your file system and update it with the same process you use for filling your oracle Table. But when you don't have enough space to keep it permanent it may be better not using the hash file. The final decission how to build you job it up to you. If you have time you can compare the different methods.

Wolfgang
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I think that, even with as few as 50000 records, you will be able to measure a difference in elapsed time between the two scenarios.
Post Reply