DB2 UDB Bulk Stage - Performance Issue

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

I will do the same and let you know.

how do i load through command prompt?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Yes. You DBA will be able to increase the page size for you.
As for invoking the utility via command prompt, do this.
  1. -Go inside you load stage and set 'Load Immediate' and 'Remove Intermediate Datafile' to No.
    -Run the job.
    -Go to the location that you have specified in the property 'Directory for Data and Command Files'. You will see three files there; INPUT.DAT (you input data file), CMD.CLP (file that contains the connect, load and quit commands) and the ULOAD.BAT (Batch file that fires the command file) files. All you need to do there is fire the ULOAD.BAT file.
    -Make sure your message file is empty or you have specified a new message file. This way you will have messages of only the current run
    -Note the time it takes to load from there.
I hope that helps.
Regards,
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

I will check that and let you know.

Meanwhile i would like to tell you about the load timings:
The director says job started at 3:36:01 PM and ended at 3:57:34 PM

The highlight of the log file:
SQL3500W The utility is beginning the "LOAD" phase at time "06-19-2006
15:32:15.081442".
SQL3116W The field value in row "F0-106933" and column "22" is missing, but the target column is not nullable.
SQL3515W The utility has finished the "LOAD" phase at time "06-19-2006
15:35:36.845726".
SQL3500W The utility is beginning the "BUILD" phase at time "06-19-2006
15:35:36.956451".
SQL3515W The utility has finished the "BUILD" phase at time "06-19-2006
15:35:38.937428".

Why does the director show start time after the log says it has completed the build phase?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Krazykoolrohit wrote: SQL3116W The field value in row "F0-106933" and column "22" is missing, but the target column is not nullable.
Ok. Thats your bottle neck. You need to build a file that will get loaded without any warnings. Warnings like these slow down the process.
Your load start and end times are impressive. Even the build of indices is very decent. It takes you 3 mins to load your data and about 2 seconds to rebuild the indices. Fix your data and you will be even more happier.
As far as the discrepency between the timings. That you need to investigate. For some odd reason there is a huge difference. Is there any after job routine in that job. Any after job execution that you might be doing. A few seconds difference is ok. But your difference is huge. That shouldnt be the case. Clear the status file, purge the logs. My first guess would be a slow server response time but if thats the case, then the server is sitting on a turtle. That cannot be it, theres something else going on in the job.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

I have a after job routine that writes to a status file. basically it records the job name and time.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Remove the after job subroutine and run the job. It should finish in time. Also, you really need to purge the log files more frequently, if the log file has a lot of entries, the routine will take a lot of time to find the correct time and status. Thats where a lot of your time is being eaten up.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

the status file never has more than 2-3 entries. right now also it has 3 entires.

Will having all jobs write to different local message file help? currently all jobs are writing to the same file. I have tried did but did not find any considerable differencr
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

One more query.

Do we need to clear the table before bulk load? Do you think it will help?

Please provide me a good way to clear the data in tables before bulk load.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Tried with what?
Not having an after job subroutine or clearing the log file?
If this is an initial load then yes, you need to clear the table.
Do a load replace from /dev/null.
Do a search on how to truncate DB2 table. You will get the syntax.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

I have tried it by removing job routine as well as by clearing the log file.
The load is marked as replace.

Still no improovement in load time. Its still stuck at 200 rows per sec.
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

I have the table stats with me. will this be any help to you?

Table Name D Size (MB) I Size (MB) Avg Max Min Records
lt-condition 684.6 38.6 259 4126 89 2,763,934 Data-

New Area Name # of Reads # of Creates
32 101,621,722- 379,187-

I also have the following message repeat itself in some job logs
SQL3039W The memory available to LOAD for DATA BUFFER prohibits full LOAD
parallelism. Load parallelism of "1" will be used
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Krazykoolrohit wrote: SQL3039W The memory available to LOAD for DATA BUFFER prohibits full LOAD
parallelism. Load parallelism of "1" will be used
Do a google on the SQL code. Heres what ive found on IBM's site
Ignore this message, and the LOAD will complete normally using the smaller value for LOAD parallelism. However, Load performance may be less than optimal.
Specify a smaller value for LOAD parallelism when invoking the utility.
Increase the size of the utility heap.
Increase the size of the data buffer parameter, or leave the parameter blank and let the LOAD utility determine a default based on the freespace in the utility heap


As per the performance. Does it start off with 200 rows/sec or it ends up like that?
Try this test not
first truncate the table by connecting to db2 via command prompt and running the following command

Code: Select all


load from  /dev/null of del replace into <table name>
Then go inside your properties of the load stage, chose load method as INSERT instead of REPLACE. Try it then.
If all fails then just use the UDB load stage to build the batch files, those three files that i mentioned earlier. Use the after stage job routine to fire the ULOAD.BAT
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

The performance starts and ends with 200. It vaires a lot with job to job. in some job it goes upto 1700 while in some its 90.

ULOAD.BAT is not running. it says DB not found.

INSERT takes pretty much the same time as UPSERT.
Post Reply