Page 1 of 1

Double Resource consume - Insert/Update Job

Posted: Wed Jun 30, 2004 7:28 am
by arunverma
We have developed Billing Delta Application , which extract incremental data based on last_modified_date from source system and load the same in DSS server , In data loading job we have choosen option " Update Existing Rows or Insert New Row" . and following two sql statement generated in SQL tab - one is for Insert and other is for Update " . our requirement is if any data already exisit then update or insert data . but Our DBA and System Administrator Analysis server log and Oracle databse log that at the time of data loading job and found that if we got 100 Record from source system ,
then it consume resource and excute insert for 100 record and update for 100 record also .

So for each got it consume resources double and execute twies .

Pl. help me , due to this cause my server is running very slow .

Thanks and Regards

Arun Verma .

Re: Double Resource consume - Insert/Update Job

Posted: Wed Jun 30, 2004 9:30 am
by ogmios
Are you using insert/update or update/insert?

Check whether there's a primary key or a unique constraint on the table you're loading to.

Check whether the key fields are set correctly in the stage writing to the database.

Ogmios

Re: Double Resource consume - Insert/Update Job

Posted: Wed Jun 30, 2004 9:33 am
by ogmios
Assmuing you use update/insert this is what probably happens (for 1 row):

DataStage tries to update the row.
This fails because all of your rows are new rows.
It then inserts the row.

So it executes "twice". If you have insert/update it will happen the other way around: the insert will be tried and if that fails the update will be tried.

Ogmios

Posted: Wed Jun 30, 2004 9:45 am
by denzilsyb
Are you using ODBC to update/insert? My experience has me avoiding ODBC update/insert because of this slow processing.

Is it possible for you to extract the existing data from Oracle and then:

1. in a transform determine which records are new (to insert) and which are for update by comparing what is in the DSS db to what you are pulling from the source.
2. Have the two option write to different targets - one to insert (in the current job) and the other to update -- possibly in the next job.
3. You could even use a bulk load stage for the insert and an odbc for the update?

perhaps the oracle users better confirm what i am suggesting.

then you might suggest sybase IQ for you DSS db :twisted:

Posted: Wed Jun 30, 2004 4:15 pm
by ray.wurlod
You need to re-think the design.

Have one link doing insert only, and another link doing update only.
Make the decision by reference lookup against the target table or - ideally - a copy of the relevant rows and columns from it in a hashed file.

You will be amazed at the difference this makes!

Posted: Wed Jun 30, 2004 11:25 pm
by arunverma
Hi All ,

Our source is Oracle Database and destination also , each table have primary key , and at the time of data loading we mark that filed as key .

As suggest by Mr. ray and Denzilsyb , we need to redesige our job ,
and create Hash file for each table and first lookup source file with this has file , filter file , one for insert and one for update , this will solve my problem .

But my question is that why Datastage can not identified both record , option is also availade that " Update Existing Rows or Insert New Row" . :roll:


Thanks

Arun

Posted: Thu Jul 01, 2004 1:44 am
by ray.wurlod
The only way that DataStage (or you, or anything else for that matter) could do it is to make reference to the target Oracle table, to determine whether the key exists already.

If it does, do an update. If it doesn't, do an insert.

Given that there is a key, this is all that you need to load into the hashed file to make that existence decision. So your hashed file is like a key-only table, and extremely fast to access.

The "update existing row or insert new row" method performs its "reference" against the target table by attempting an update, and checking on whether the update failed because the row does not exist. It's a much more laborious process than a key-only lookup to a hashed file.