lookup condition

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
parvathi
Participant
Posts: 103
Joined: Wed Jul 05, 2006 4:48 am
Contact:

lookup condition

Post by parvathi »

Hi all,

I have a design as follows

oraclestage--->sequentialfile--->transformer---> output oracle stage
output oracle stage has two links one for update and insert

hashed file look up.... with only surroagate key column

what I have to do here is that I have to insert or update

i have done the following in transformer constraints
with surrogate key as primary key
i have mentioned the following conditions
for ex
for insert ::::IsNull(product_id_sk)
FOR UPDATE::: NOT(IsNull(product_id_sk))

But the performance was not good
so i have used the look up hashed file...

Instead of using the hashed look up is there any way tp procced.. for updating or inserting

waiting for your valuable ideas.....
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What is the point of the Sequential File stage in your job design? It blocks all rows until they are all selected from Oracle. But the clock keeps running so your elapsed time is longer than it needs to be. The hashed file lookup is the most efficient possible; that is not where your problem lies.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
parvathi
Participant
Posts: 103
Joined: Wed Jul 05, 2006 4:48 am
Contact:

Post by parvathi »

the sequential file was a part of mine design process....
the whole idea behind that was

the sequential file stage has to wait until it recieves all the records from the oracle stage and then it performs look up with hashed file where this hashed file takes records ffrom the oracle stage target

without the sequential file the oracle stage has to hit the database for each and every record it retrieves and perform the look up...


which option would be better... ???

Or instead of using the lookup can i check the condition in the source view that i am using in the input oracle stage giving some specific conditions

Or instead of using the look up hashed file,
in the target oracle stage under the update action can we put some other options like insert and update rows or anything that suits better


can you suugest me regarding me this...
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

without the sequential file the oracle stage has to hit the database for each and every record it retrieves and perform the look up...
What you believe to be helpful is actually an unnecessary bottleneck. In addition to retrieving each and every record from Oracle and doing a lookup for each record, you have added the additional overhead of writing each record to disk and reading it back from disk. As Ray pointed out, the sequential file is "blocking" your throughput because all of the records have to be written to disk before they can start to be read from the disk.

1) Replace the sequential file stage with an IPC stage. This will explicitly split the process that is reading from Oracle and the process that is doing the lookup (transformer) into 2 processes on separate cpu's. The blocking sequential file will be replaced by some pipeline parallelism.

2) If you are not already doing so, enable the preload to memory option for the hash file.

3) Depending on the volume of inserts, you may want to consider writing the inserts to a sequential file and then bulk loading them with sql*loader.

Mike
parvathi
Participant
Posts: 103
Joined: Wed Jul 05, 2006 4:48 am
Contact:

Post by parvathi »

Mike,
Thanks for your valuable suggestion... i am trying with out the sequential file

with out using the look up hashed file can we do anything in the transformer for inserting or updating the reords target table
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You need something against which to lookup to determine whether to insert or update. Whether it's a hashed file or the target table directly is up to you, but the hashed file will be faster.

Code: Select all

                   TargetTable1 --->  HashedFile
                                          |
                                          |                     
                                          V       insert
             SourceTable   ----->  Transformer  -------->  TargetTable2
                                                -------->
                                                  update
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply