DB2 stage performance issue

dh_Madhu · Post by **dh_Madhu** » Mon Oct 22, 2007 2:23 am

Hi,
A job uses db2 EE stage to insert new records with a sequence number generated in db2 database. The performance drops drastically to a hundred rows per second from as high as seventy thousand rows/sec until it reaches the target db2 stage. Prior to writing the data to the table, a lookup on the same table is done using another db2 stage. The db2 is actually in 2 different servers,one for admin purposes and the other to write data to. The apt file has both the db2 server nodes mentioned in it apart from the Datastage server.
Now, could this be a environment issue or a job design issue? Is there any other way to handle this?
Any suggestions please.

ArndW · Post by **ArndW** » Mon Oct 22, 2007 2:38 am

Is your DB2 databse partitioned? Also, try replacing the target DB2 enterprise stage with a DataSet and see what your throughput is - don't bother with the numbers during the run, just use your total records divided by total elapsed time upon completion to get a baseline figure. Is it much faster than before? If yes, you can concentrate on the load to DB2, otherwise you should look elsewhere for bottlenecks.

dh_Madhu · Post by **dh_Madhu** » Mon Oct 22, 2007 2:55 am

Yes, when loaded to a dataset it is pretty faster.
So, what options are available to tune this db2 stage?
earlier, when the partion type was set to db2 the job got aborted throwing a fatal error which read "could not find the <Seq_Num> field. The insert statement calls the seq num generator table.

ArndW · Post by **ArndW** » Mon Oct 22, 2007 3:26 am

If you use DB2 partitioning then you will automatically have PX loading each partition in parallel and that will get you the best performance. Could you post your exact error message and also insert statement when using DB2 partitioning?

balajisr · Post by **balajisr** » Mon Oct 22, 2007 3:34 am

Can you try switching off generation of sequence number using db2 and find out if there is any improvement in performance.

dh_Madhu · Post by **dh_Madhu** » Mon Oct 22, 2007 3:45 am

Well, the error it throws is
"main_program: Missing field: event_id in the input dataset."
db2EVTXREFups: When preparing operator: When binding partitioner interface: Could not find input field "event_id".
main_program: One or more operators failed to prepare OK
main_program: Creation of a step finished with status = FAILED.

The insert statement is
INSERT INTO
#$HEW_TGT_COUNTRY_SCHEMA#.EVTXREF
(
EVENT_ID,
OLD_EVENT_ID,
EVTIDKEY,
START_DT,
SRCE_SYS_CDE,
UPDT_DT_TM
)
VALUES
(
NEXTVAL FOR #$HEW_TGT_REGIONAL_SCHEMA#.HEW_EVENT_SKSEQ,
NULL,
ORCHESTRATE.EVENT_SRCE_KEY,
ORCHESTRATE.EVENT_DT,
ORCHESTRATE.SRCE_SYS_CDE,
ORCHESTRATE.UPDT_DT_TM
)

dh_Madhu · Post by **dh_Madhu** » Mon Oct 22, 2007 4:21 am

droping the sequence number generator and placing a surrogate key generator improved the performance to very marginal levels....around 470 rows/sec only

stefanfrost1 · Post by **stefanfrost1** » Mon Oct 22, 2007 4:44 am

You cant do parrallel insert or load to a source table if your source table has the partition key as a sequnce number which you generate at insert... Datastage will try to partition on the same column its missing (hence your error message). So it can only run in sequncial. If your partitiong key is not a sequnce number and in your driving data then you are able to use the partitioning feature.

If you are still suffering from performance problem it can relate to database features such as clustered indexes/data and insert features such as append off or such. Try removing any clustered indexes or sort data before inserting and se if that helps. Unique constraints and to many indexes can also affect preformance...

dh_Madhu · Post by **dh_Madhu** » Mon Oct 22, 2007 5:27 am

Well, a sequential load and a collector type of sort merge did raise the performance from 147 rows/sec to 3700 rows/sec.
wonder if this could still be increased?
A new insert to the table actually showed a 55000 + rows/sec but the database seems to insert at a very low rate.
Has anybody got anything up their sleeves to share with?

gbusson · Post by **gbusson** » Mon Oct 22, 2007 7:48 am

what are the options of the load?

Is it unrecoverable?

dh_Madhu · Post by **dh_Madhu** » Mon Oct 22, 2007 9:05 pm

Well, its a incremental load and the real bottle neck seems to be the sequencer number generated in the database which does not allow the parallel processing in db2. Earlier while generating numbers using a surrogate key stage, the job took 30 secs to run a million records aparently using the luxury of running in parallel. Cant we maintain the performance whilst using the db generated sequence? Any suggestions please. Thanks!

ray.wurlod · Post by **ray.wurlod** » Mon Oct 22, 2007 10:17 pm

The sequence generated within the database is necessarily sequential. Think about it. Therefore the answer to your question is "no".

vmcburney · Post by **vmcburney** » Tue Oct 23, 2007 2:27 am

You could cheat. Retrieve the sequence number via a script running the DB2 command line language from a sequence job and pass it to the parallel job as a starter value in the surrogate key stage. Load all the rows using ETL generated numbers. At the end run another script that loads the highest key value back into the DB2 sequencer.

dh_Madhu · Post by **dh_Madhu** » Tue Oct 23, 2007 4:38 am

yes! This is exactly what I intend to do.
Thanks.