DB2 API vs DB2 UDB enterprise

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
srikanthkilaru
Participant
Posts: 14
Joined: Wed Aug 31, 2005 3:47 pm

DB2 API vs DB2 UDB enterprise

Post by srikanthkilaru »

Hi, wll there be any performance diffrence between DB2 API and DB2 UDB ENTERPRISE. cause recently i changed DB2 UDB ENTERPRISE to DB2 API in some jobs. before when the stage was DB2 UDB ENTERPRISE that jab used to take 4 hours but now when i changed to DB2 API its taking like 6 to 7 hours. can u help me in this issue.....appreciate ur help
srikanth
etl developer
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

My understanding is that the DB2 Enterprise stage is more efficient, although to be honest I don't remember the reasoning. I'll leave that to the gurus.

What is your reasoning for switching to the API version?

4-6 hours runtime seems awfully long. How big is your job (i.e. how many stages within one job)? How complex is the SQL within your DB2 stages? It seems like you should be getting better runtimes... but I would assume you have already come to that conclusion as well :)

Brad.
srikanthkilaru
Participant
Posts: 14
Joined: Wed Aug 31, 2005 3:47 pm

Post by srikanthkilaru »

thanks brad for your promt reply..... the reason we switched to API is when we are using DB2 ENTERPRISE its not letting to connect to another server, by API we can connect to any server. And this jobs deals with 6 million records. I guess due to joins we are using in the query its taking long time, but when i changed this enterprise to API its taking more time thats what my concern is.....reallt appreciate if u can give good advise for the better performance.....
srikanth
etl developer
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

How are you using the DB2 stage? For reads, inserts, loads?

When we have a slow running job, we check the performance of each query involved. Make sure you have the SQL joins as efficient as possible - joining on indexed keys at a minimum. Ideally, you want to join on the same partitioning key.

If you are joining between tables that are partitioned differently on the database, you may want to consider having 2 individual reads and then join the data in the DataStage job. We have improved some processes that way - sometimes DS can repartition the data faster than the database.

Are you using the partitioning table option your reads? That can improve you performance as well - it ensures that you are pulling data from the database in parallel instead of a single stream.

If you are doing loads, consider loading into table(s) that don't have any indexes. We usually load into empty tables, so indexes can be added after the load itself. Having indexes on the table while you are loading can slow it down - it has to catalog every record as you load.

Another thing to consider with your DB2 output is to use the load option instead of write. The load option uses DB2's autoloader utility and is much faster. However, I would highly recommend using this on empty tables. We have seen situations where the table has to be dropped and recreated due to a bad load - not a good thing if there is pre-existing data.

Hope this helps.

Brad.
Mat01
Premium Member
Premium Member
Posts: 50
Joined: Wed Jun 02, 2004 11:12 am
Location: Montreal, Canada

Post by Mat01 »

DB2 UDB stage usefulness relies on its ability to execute DB2 treatment in parallel. Hence, the speed of execution will be greatly reduced if you switch to a API stage on a multiprocessor server. There are ways to connect to another server with the DB2 UDB:

The first and IBM recommended method is to install DataStage on the remote server and treat both the local and remote server as a cluster of machines. See the following document on IBM support:

http://developernet.ascential.com/forum ... .php?id=66

( DataStageEE_Remote_DB2_Configuration.zip )

If, like us, you feel that buying a license for DataStage if you are only going to access DB2 on a server is a little too much, there is a work around, for this, you need to name both your local and remote instance with the same name and then map the remote server in the DB2 local client with the option trusted server on. It works well for us.

However, I have been told by IBM that the upcoming DS version will introduce a new kind of connection to DB2 which could change the way local-remote DB2 connection works

Regards,

Mat
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

The DB2 Enterprise stage is the way to go if you can work out how to configure those remote connections. Much faster in reading and writing. If you have effective DB2 table partitioning you can get some huge performance improvements over the API stage.
Post Reply