Hello,
I have a problem of performances during the treatments between a flat file and DB2. (approximately 100 lines per seconds).
In my treatments, I read the file (1 300 000 lines), there are one or two lookups, and then I write in the table (update or insert).
I am obliged to pass by update or insert, because I have data for which I must insert a date according to a criterion, therefore not of possibility of passing by the loader.
Does somebody have infos?
Thank you very much
Muriel
Problem of performances with DB2
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 43
- Joined: Wed Feb 19, 2003 7:17 am
- Location: France - Paris
- Contact:
Re: Problem of performances with DB2
100 rows per second seem to be about right when you use a couple of lookups. As a test (on a test database) remove the lookups and check the speed, depending on your system you should be able to get between 1000 and 2000 rows per second.
The trick we usually do is to load the table to a (staging) table in DB2 without any lookups and then in another job do the lookups in a single big sql query using a select on the table you loaded and 2 left outer joins to bring in the lookup data (so using 1 stage with a big query), this way you can achieve speeds between 200 and 800 rows per second.
Hint: you probably can't now because of the updates but never ever use the DataStage DB2 bulk loader. This stage has some very big disadvantages.
Ogmios
The trick we usually do is to load the table to a (staging) table in DB2 without any lookups and then in another job do the lookups in a single big sql query using a select on the table you loaded and 2 left outer joins to bring in the lookup data (so using 1 stage with a big query), this way you can achieve speeds between 200 and 800 rows per second.
Hint: you probably can't now because of the updates but never ever use the DataStage DB2 bulk loader. This stage has some very big disadvantages.
Ogmios
Hi,
2 lookups dropping to 100 rows per second from over 1000 seems curious to me, unless you don't preload the hash files to memory.
IMHO, you can do 1 of the following:
1. D&C = try to split the process to several prallel ones.
2. try to figure out why your preformance is so poor and try to improve it.
for the latter we need more info to try and give you ideas on what might be wrong.
IHTH,
2 lookups dropping to 100 rows per second from over 1000 seems curious to me, unless you don't preload the hash files to memory.
IMHO, you can do 1 of the following:
1. D&C = try to split the process to several prallel ones.
2. try to figure out why your preformance is so poor and try to improve it.
for the latter we need more info to try and give you ideas on what might be wrong.
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
-
- Participant
- Posts: 43
- Joined: Wed Feb 19, 2003 7:17 am
- Location: France - Paris
- Contact:
Thank you with all.
I will wish to know the very large disadvantages of the DB2 bulk loader.
I carried out a test such as (Update or Insert and commit = 1000) :
Sequential File --------> Transformer ----------> Plugin DB2
the performances are the same ones (200 lines second)
It does not have no transformations there complexes in transforming it.
Is there a parameter setting particular to set up on DB2 (for recall all envirronement is on NT ==> Datastage and DB2)?
Thanks
Muriel[/code]
I will wish to know the very large disadvantages of the DB2 bulk loader.
I carried out a test such as (Update or Insert and commit = 1000) :
Sequential File --------> Transformer ----------> Plugin DB2
the performances are the same ones (200 lines second)
It does not have no transformations there complexes in transforming it.
Is there a parameter setting particular to set up on DB2 (for recall all envirronement is on NT ==> Datastage and DB2)?
Thanks
Muriel[/code]
Disadvantages of the DB2 bulk loader stage:
* makes a control file that clearly shows the passwords
* Has problems with non local DB2 systems (tries to write to directory structures that it fetches from the local system, when the local and remote system do not have the same layout....)
* Does not expose all possible switches of db2load, which can really affect perfomance
* Has the unusual habit of not aborting the job when the db2 load fails, just a warning. I have no idea why they built it this way.
Ogmios
* makes a control file that clearly shows the passwords
* Has problems with non local DB2 systems (tries to write to directory structures that it fetches from the local system, when the local and remote system do not have the same layout....)
* Does not expose all possible switches of db2load, which can really affect perfomance
* Has the unusual habit of not aborting the job when the db2 load fails, just a warning. I have no idea why they built it this way.
Ogmios
For the performance issues, this is where the black magic starts.
What I "usually" do: load a file with small width (little columns) and do a straight load in DB2 (without transformations or constraints). This will give you a good idea of the maximum throughput possible.
If it does not differ much from your 200 rows per seconds you would have to go searching elsewhere maybe involving your DBA's.
And be careful with transformations as even seemingly harmless little transformations can be a big bottleneck. Also wide rows can cause performance to degrade drastically (the more data DataStage has to processs the slower it is).
For the lookups are you using "ODBC lookups" or via hash files?
Ogmios
What I "usually" do: load a file with small width (little columns) and do a straight load in DB2 (without transformations or constraints). This will give you a good idea of the maximum throughput possible.
If it does not differ much from your 200 rows per seconds you would have to go searching elsewhere maybe involving your DBA's.
And be careful with transformations as even seemingly harmless little transformations can be a big bottleneck. Also wide rows can cause performance to degrade drastically (the more data DataStage has to processs the slower it is).
For the lookups are you using "ODBC lookups" or via hash files?
Ogmios
-
- Participant
- Posts: 43
- Joined: Wed Feb 19, 2003 7:17 am
- Location: France - Paris
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It is one of my personal standards that nothing ever aborts. Nothing even generates a warning unless something needs to be warned about, in which case it is picked up in the controlling process and handles in an appropriate fashion - which might include not doing any further processing, but will include notifying someone that there's been a problem.ogmios wrote:* Has the unusual habit of not aborting the job when the db2 load fails, just a warning. I have no idea why they built it this way.
No need ever to have to reset jobs, unless DataStage has detected a condition that even it can't handle.
In the case of bulk loaders, the controlling job checks the bulk loader's log and/or error file, even if there have been no warnings. It's usually enough to grep for a keyword like ERROR.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.