Problem of performances with DB2

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
auvray.muriel
Participant
Posts: 43
Joined: Wed Feb 19, 2003 7:17 am
Location: France - Paris
Contact:

Problem of performances with DB2

Post by auvray.muriel »

Hello,

I have a problem of performances during the treatments between a flat file and DB2. (approximately 100 lines per seconds). :cry:

In my treatments, I read the file (1 300 000 lines), there are one or two lookups, and then I write in the table (update or insert).

I am obliged to pass by update or insert, because I have data for which I must insert a date according to a criterion, therefore not of possibility of passing by the loader.

Does somebody have infos?

Thank you very much

Muriel
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Re: Problem of performances with DB2

Post by ogmios »

100 rows per second seem to be about right when you use a couple of lookups. As a test (on a test database) remove the lookups and check the speed, depending on your system you should be able to get between 1000 and 2000 rows per second.

The trick we usually do is to load the table to a (staging) table in DB2 without any lookups and then in another job do the lookups in a single big sql query using a select on the table you loaded and 2 left outer joins to bring in the lookup data (so using 1 stage with a big query), this way you can achieve speeds between 200 and 800 rows per second.

Hint: you probably can't now because of the updates but never ever use the DataStage DB2 bulk loader. This stage has some very big disadvantages.

Ogmios
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
2 lookups dropping to 100 rows per second from over 1000 seems curious to me, unless you don't preload the hash files to memory.
IMHO, you can do 1 of the following:
1. D&C = try to split the process to several prallel ones.
2. try to figure out why your preformance is so poor and try to improve it.
for the latter we need more info to try and give you ideas on what might be wrong.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
auvray.muriel
Participant
Posts: 43
Joined: Wed Feb 19, 2003 7:17 am
Location: France - Paris
Contact:

Post by auvray.muriel »

Thank you with all.

I will wish to know the very large disadvantages of the DB2 bulk loader.

I carried out a test such as (Update or Insert and commit = 1000) :

Sequential File --------> Transformer ----------> Plugin DB2

the performances are the same ones (200 lines second)

It does not have no transformations there complexes in transforming it.

Is there a parameter setting particular to set up on DB2 (for recall all envirronement is on NT ==> Datastage and DB2)?

Thanks

Muriel[/code]
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

Disadvantages of the DB2 bulk loader stage:
* makes a control file that clearly shows the passwords
* Has problems with non local DB2 systems (tries to write to directory structures that it fetches from the local system, when the local and remote system do not have the same layout....)
* Does not expose all possible switches of db2load, which can really affect perfomance
* Has the unusual habit of not aborting the job when the db2 load fails, just a warning. I have no idea why they built it this way.

Ogmios
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

For the performance issues, this is where the black magic starts.

What I "usually" do: load a file with small width (little columns) and do a straight load in DB2 (without transformations or constraints). This will give you a good idea of the maximum throughput possible.

If it does not differ much from your 200 rows per seconds you would have to go searching elsewhere maybe involving your DBA's.

And be careful with transformations as even seemingly harmless little transformations can be a big bottleneck. Also wide rows can cause performance to degrade drastically (the more data DataStage has to processs the slower it is).

For the lookups are you using "ODBC lookups" or via hash files?

Ogmios
auvray.muriel
Participant
Posts: 43
Joined: Wed Feb 19, 2003 7:17 am
Location: France - Paris
Contact:

Post by auvray.muriel »

I will test the loading without constraint and transformations.

For the lookups, I pass by the hashfiles, and there I do not have problems of performances.

The only problems of performances, are at the time of the writing in the data base...


Muriel
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

ogmios wrote:* Has the unusual habit of not aborting the job when the db2 load fails, just a warning. I have no idea why they built it this way.
It is one of my personal standards that nothing ever aborts. Nothing even generates a warning unless something needs to be warned about, in which case it is picked up in the controlling process and handles in an appropriate fashion - which might include not doing any further processing, but will include notifying someone that there's been a problem.
No need ever to have to reset jobs, unless DataStage has detected a condition that even it can't handle.
In the case of bulk loaders, the controlling job checks the bulk loader's log and/or error file, even if there have been no warnings. It's usually enough to grep for a keyword like ERROR.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply