DB2 stage performance issue
Moderators: chulett, rschirm, roy
DB2 stage performance issue
Hi,
A job uses db2 EE stage to insert new records with a sequence number generated in db2 database. The performance drops drastically to a hundred rows per second from as high as seventy thousand rows/sec until it reaches the target db2 stage. Prior to writing the data to the table, a lookup on the same table is done using another db2 stage. The db2 is actually in 2 different servers,one for admin purposes and the other to write data to. The apt file has both the db2 server nodes mentioned in it apart from the Datastage server.
Now, could this be a environment issue or a job design issue? Is there any other way to handle this?
Any suggestions please.
A job uses db2 EE stage to insert new records with a sequence number generated in db2 database. The performance drops drastically to a hundred rows per second from as high as seventy thousand rows/sec until it reaches the target db2 stage. Prior to writing the data to the table, a lookup on the same table is done using another db2 stage. The db2 is actually in 2 different servers,one for admin purposes and the other to write data to. The apt file has both the db2 server nodes mentioned in it apart from the Datastage server.
Now, could this be a environment issue or a job design issue? Is there any other way to handle this?
Any suggestions please.
Regards,
Madhu Dharmapuri
Madhu Dharmapuri
Is your DB2 databse partitioned? Also, try replacing the target DB2 enterprise stage with a DataSet and see what your throughput is - don't bother with the numbers during the run, just use your total records divided by total elapsed time upon completion to get a baseline figure. Is it much faster than before? If yes, you can concentrate on the load to DB2, otherwise you should look elsewhere for bottlenecks.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Yes, when loaded to a dataset it is pretty faster.
So, what options are available to tune this db2 stage?
earlier, when the partion type was set to db2 the job got aborted throwing a fatal error which read "could not find the <Seq_Num> field. The insert statement calls the seq num generator table.
So, what options are available to tune this db2 stage?
earlier, when the partion type was set to db2 the job got aborted throwing a fatal error which read "could not find the <Seq_Num> field. The insert statement calls the seq num generator table.
Regards,
Madhu Dharmapuri
Madhu Dharmapuri
If you use DB2 partitioning then you will automatically have PX loading each partition in parallel and that will get you the best performance. Could you post your exact error message and also insert statement when using DB2 partitioning?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Well, the error it throws is
"main_program: Missing field: event_id in the input dataset."
db2EVTXREFups: When preparing operator: When binding partitioner interface: Could not find input field "event_id".
main_program: One or more operators failed to prepare OK
main_program: Creation of a step finished with status = FAILED.
The insert statement is
INSERT INTO
#$HEW_TGT_COUNTRY_SCHEMA#.EVTXREF
(
EVENT_ID,
OLD_EVENT_ID,
EVTIDKEY,
START_DT,
SRCE_SYS_CDE,
UPDT_DT_TM
)
VALUES
(
NEXTVAL FOR #$HEW_TGT_REGIONAL_SCHEMA#.HEW_EVENT_SKSEQ,
NULL,
ORCHESTRATE.EVENT_SRCE_KEY,
ORCHESTRATE.EVENT_DT,
ORCHESTRATE.SRCE_SYS_CDE,
ORCHESTRATE.UPDT_DT_TM
)
"main_program: Missing field: event_id in the input dataset."
db2EVTXREFups: When preparing operator: When binding partitioner interface: Could not find input field "event_id".
main_program: One or more operators failed to prepare OK
main_program: Creation of a step finished with status = FAILED.
The insert statement is
INSERT INTO
#$HEW_TGT_COUNTRY_SCHEMA#.EVTXREF
(
EVENT_ID,
OLD_EVENT_ID,
EVTIDKEY,
START_DT,
SRCE_SYS_CDE,
UPDT_DT_TM
)
VALUES
(
NEXTVAL FOR #$HEW_TGT_REGIONAL_SCHEMA#.HEW_EVENT_SKSEQ,
NULL,
ORCHESTRATE.EVENT_SRCE_KEY,
ORCHESTRATE.EVENT_DT,
ORCHESTRATE.SRCE_SYS_CDE,
ORCHESTRATE.UPDT_DT_TM
)
Regards,
Madhu Dharmapuri
Madhu Dharmapuri
-
- Premium Member
- Posts: 99
- Joined: Mon Sep 03, 2007 7:49 am
- Location: Stockholm, Sweden
You cant do parrallel insert or load to a source table if your source table has the partition key as a sequnce number which you generate at insert... Datastage will try to partition on the same column its missing (hence your error message). So it can only run in sequncial. If your partitiong key is not a sequnce number and in your driving data then you are able to use the partitioning feature.
If you are still suffering from performance problem it can relate to database features such as clustered indexes/data and insert features such as append off or such. Try removing any clustered indexes or sort data before inserting and se if that helps. Unique constraints and to many indexes can also affect preformance...
If you are still suffering from performance problem it can relate to database features such as clustered indexes/data and insert features such as append off or such. Try removing any clustered indexes or sort data before inserting and se if that helps. Unique constraints and to many indexes can also affect preformance...
-------------------------------------
http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles
http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles
Well, a sequential load and a collector type of sort merge did raise the performance from 147 rows/sec to 3700 rows/sec.
wonder if this could still be increased?
A new insert to the table actually showed a 55000 + rows/sec but the database seems to insert at a very low rate.
Has anybody got anything up their sleeves to share with?
wonder if this could still be increased?
A new insert to the table actually showed a 55000 + rows/sec but the database seems to insert at a very low rate.
Has anybody got anything up their sleeves to share with?
Regards,
Madhu Dharmapuri
Madhu Dharmapuri
Well, its a incremental load and the real bottle neck seems to be the sequencer number generated in the database which does not allow the parallel processing in db2. Earlier while generating numbers using a surrogate key stage, the job took 30 secs to run a million records aparently using the luxury of running in parallel. Cant we maintain the performance whilst using the db generated sequence? Any suggestions please. Thanks!
Regards,
Madhu Dharmapuri
Madhu Dharmapuri
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
You could cheat. Retrieve the sequence number via a script running the DB2 command line language from a sequence job and pass it to the parallel job as a starter value in the surrogate key stage. Load all the rows using ETL generated numbers. At the end run another script that loads the highest key value back into the DB2 sequencer.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn