DB2 stage performance issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dh_Madhu
Premium Member
Premium Member
Posts: 65
Joined: Sat Apr 23, 2005 3:19 am
Location: Stirling, Scotland

DB2 stage performance issue

Post by dh_Madhu »

Hi,
A job uses db2 EE stage to insert new records with a sequence number generated in db2 database. The performance drops drastically to a hundred rows per second from as high as seventy thousand rows/sec until it reaches the target db2 stage. Prior to writing the data to the table, a lookup on the same table is done using another db2 stage. The db2 is actually in 2 different servers,one for admin purposes and the other to write data to. The apt file has both the db2 server nodes mentioned in it apart from the Datastage server.
Now, could this be a environment issue or a job design issue? Is there any other way to handle this?
Any suggestions please.
Regards,
Madhu Dharmapuri
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Is your DB2 databse partitioned? Also, try replacing the target DB2 enterprise stage with a DataSet and see what your throughput is - don't bother with the numbers during the run, just use your total records divided by total elapsed time upon completion to get a baseline figure. Is it much faster than before? If yes, you can concentrate on the load to DB2, otherwise you should look elsewhere for bottlenecks.
dh_Madhu
Premium Member
Premium Member
Posts: 65
Joined: Sat Apr 23, 2005 3:19 am
Location: Stirling, Scotland

Post by dh_Madhu »

Yes, when loaded to a dataset it is pretty faster.
So, what options are available to tune this db2 stage?
earlier, when the partion type was set to db2 the job got aborted throwing a fatal error which read "could not find the <Seq_Num> field. The insert statement calls the seq num generator table.
Regards,
Madhu Dharmapuri
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you use DB2 partitioning then you will automatically have PX loading each partition in parallel and that will get you the best performance. Could you post your exact error message and also insert statement when using DB2 partitioning?
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

Can you try switching off generation of sequence number using db2 and find out if there is any improvement in performance.
dh_Madhu
Premium Member
Premium Member
Posts: 65
Joined: Sat Apr 23, 2005 3:19 am
Location: Stirling, Scotland

Post by dh_Madhu »

Well, the error it throws is
"main_program: Missing field: event_id in the input dataset."
db2EVTXREFups: When preparing operator: When binding partitioner interface: Could not find input field "event_id".
main_program: One or more operators failed to prepare OK
main_program: Creation of a step finished with status = FAILED.

The insert statement is
INSERT INTO
#$HEW_TGT_COUNTRY_SCHEMA#.EVTXREF
(
EVENT_ID,
OLD_EVENT_ID,
EVTIDKEY,
START_DT,
SRCE_SYS_CDE,
UPDT_DT_TM
)
VALUES
(
NEXTVAL FOR #$HEW_TGT_REGIONAL_SCHEMA#.HEW_EVENT_SKSEQ,
NULL,
ORCHESTRATE.EVENT_SRCE_KEY,
ORCHESTRATE.EVENT_DT,
ORCHESTRATE.SRCE_SYS_CDE,
ORCHESTRATE.UPDT_DT_TM
)
Regards,
Madhu Dharmapuri
dh_Madhu
Premium Member
Premium Member
Posts: 65
Joined: Sat Apr 23, 2005 3:19 am
Location: Stirling, Scotland

Post by dh_Madhu »

droping the sequence number generator and placing a surrogate key generator improved the performance to very marginal levels....around 470 rows/sec only
Regards,
Madhu Dharmapuri
stefanfrost1
Premium Member
Premium Member
Posts: 99
Joined: Mon Sep 03, 2007 7:49 am
Location: Stockholm, Sweden

Post by stefanfrost1 »

You cant do parrallel insert or load to a source table if your source table has the partition key as a sequnce number which you generate at insert... Datastage will try to partition on the same column its missing (hence your error message). So it can only run in sequncial. If your partitiong key is not a sequnce number and in your driving data then you are able to use the partitioning feature.

If you are still suffering from performance problem it can relate to database features such as clustered indexes/data and insert features such as append off or such. Try removing any clustered indexes or sort data before inserting and se if that helps. Unique constraints and to many indexes can also affect preformance...
-------------------------------------
http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles
dh_Madhu
Premium Member
Premium Member
Posts: 65
Joined: Sat Apr 23, 2005 3:19 am
Location: Stirling, Scotland

Post by dh_Madhu »

Well, a sequential load and a collector type of sort merge did raise the performance from 147 rows/sec to 3700 rows/sec.
wonder if this could still be increased?
A new insert to the table actually showed a 55000 + rows/sec but the database seems to insert at a very low rate.
Has anybody got anything up their sleeves to share with?
Regards,
Madhu Dharmapuri
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Post by gbusson »

what are the options of the load?

Is it unrecoverable?
dh_Madhu
Premium Member
Premium Member
Posts: 65
Joined: Sat Apr 23, 2005 3:19 am
Location: Stirling, Scotland

Post by dh_Madhu »

Well, its a incremental load and the real bottle neck seems to be the sequencer number generated in the database which does not allow the parallel processing in db2. Earlier while generating numbers using a surrogate key stage, the job took 30 secs to run a million records aparently using the luxury of running in parallel. Cant we maintain the performance whilst using the db generated sequence? Any suggestions please. Thanks!
Regards,
Madhu Dharmapuri
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The sequence generated within the database is necessarily sequential. Think about it. Therefore the answer to your question is "no".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

You could cheat. Retrieve the sequence number via a script running the DB2 command line language from a sequence job and pass it to the parallel job as a starter value in the surrogate key stage. Load all the rows using ETL generated numbers. At the end run another script that loads the highest key value back into the DB2 sequencer.
dh_Madhu
Premium Member
Premium Member
Posts: 65
Joined: Sat Apr 23, 2005 3:19 am
Location: Stirling, Scotland

Post by dh_Madhu »

yes! This is exactly what I intend to do.
Thanks.
Regards,
Madhu Dharmapuri
Post Reply