ORAOCI8 stage

luca · Post by **luca** » Wed Feb 19, 2003 1:43 am

Hi !
I have one job with an ORAOCI8 stage with 2 input links, updating the same table (Update action : Update existing rows or insert new rows).
I thought it could be fine to control the execution for this update action in order to sequence this 2 update actions (one after the other would provide better performance than both at the same time).
I don't want to do it using a sequencer because my job is a simple extract/load job, without sequential file stage.
What would be a nice way to do this in a single job ?
Thanks.

ray.wurlod · Post by **ray.wurlod** » Wed Feb 19, 2003 4:21 am

The usual solution to this is to pre-load a hashed file with the primary key values from the target table, then perform a reference lookup against the hashed file.
If the lookup succeeds, then the row exists and you Update.
If the lookup fails, then the row does not exist and you Insert.
And, yes, it's substantially faster.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

luca · Post by **luca** » Wed Feb 19, 2003 5:13 am

Hi Ray !
Yes I know that what you suggest is the most performing solution. But in this job, we are not loading a lot of records that's why we thought of simply using this update action (Update existing rows or insert new rows).
My question was : is it possible to sequence the execution of this 2 links ? This would make sure that we have no locks on the table reduicing the insert/update performance.

WoMaWil · Post by **WoMaWil** » Wed Feb 19, 2003 9:41 am

Luca,

you can for sure use the Update action "Update existing rows or insert new rows" (Beware of the opposite!!) and your job will work.

If you don't have anything in the file or in a lookup table which tells you what action to proceed use the above action.

Wolfgang

luca · Post by **luca** » Thu Feb 20, 2003 2:24 am

I forgot to precise that in this job, 90% of the records will have to be updated and 10% only inserted. In that case, using a hash file is probably less efficient than simply doing a "Update existing rows or insert new rows" in a OCI stage.
What do you think ?
Thanks

ray.wurlod · Post by **ray.wurlod** » Thu Feb 20, 2003 5:39 am

Depending on how big the Oracle table is and where it is located, I'd guess that the hashed file method is generally still going to be more efficient.
The load of the hashed file will be based on SELECT pk FROM table which should be very speedy (it should get them from the PK index). It can be done in the same job as the rest of the processing because a passive stage type cannot open its output links until its input links are closed.
Why not set up both scenarios, and report your results?

luca · Post by **luca** » Thu Feb 20, 2003 5:55 am

This job doesn't process a huge amount of records (50000). Not the best job to do this performance scenario.
But in the near future, we are going to optimize our jobs and we'll do this kind of performance tests for the jobs processing a lot of data. I'll let you know the results then.

WoMaWil · Post by **WoMaWil** » Thu Feb 20, 2003 8:56 am

luca,

for the optimal version of any job only rules of thumb could be given. You are right, that you have to take into account the time of filling the Hashfile. Also with the Hashfile you have several posibilities. One is to fill it completly within the Job an other is to keep it permanent on your file system and update it with the same process you use for filling your oracle Table. But when you don't have enough space to keep it permanent it may be better not using the hash file. The final decission how to build you job it up to you. If you have time you can compare the different methods.

Wolfgang

ray.wurlod · Post by **ray.wurlod** » Thu Feb 20, 2003 3:53 pm

I think that, even with as few as 50000 records, you will be able to measure a difference in elapsed time between the two scenarios.