When to use bulk Load?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
shamshad
Premium Member
Premium Member
Posts: 147
Joined: Wed Aug 25, 2004 1:39 pm
Location: Detroit,MI

When to use bulk Load?

Post by shamshad »

Our datamart ETL jobs processes about 300,000 to 900,000 rows everyday that comes as source in text file. The first job reads the text file and populate a staging DB2 UDB table and from that staging table all the dimensions are updated and facts are populated. There are about 8-9 dimensions and 3 facts tables. It takes about 3-4 hours for the whole job to complete.

I am wondering if we should use bulk loading when populating some of the tables that take long time to load. Currently, there are no bulk load but only inserts.

My question is when is the best scenario to do bulk loading? I mean, if we have to load 3-400,000 rows in fact table, should we go for bulk loading?

Do you guys have any standard limit that use bulk load only when the row count is 1 million and above....etc... I would greatly appreciate your feedback
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Anything over about 1000 rows is likely to benefit.

It may be possible to get even better speed, by not using the "bulk load" stage at all but, rather, using a Sequential File stage to write the data file(s), and invoking the bulk loader (using a pre-written, tuned, control file) either in an after-job subroutine or in an Execute Command activity in the controlling job sequence.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ascen
Premium Member
Premium Member
Posts: 67
Joined: Wed Apr 12, 2006 6:47 am
Contact:

Post by ascen »

ray.wurlod wrote:and invoking the bulk loader (using a pre-written, tuned, control file) either in an after-job subroutine or in an Execute Command activity in the controlling job sequence.
Would DS log any error occured during such command?
Or does it just send some file to rdbms and forget about it?

Regards,
Piotrek
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That would depend on you and how you wrote the invoking routine.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

The bulk loader works on its own. It has a message file that it writes to. And it writes every step, even its commits. That file is very usefull if any problem occurs. The DS log doesnt help much has the only work it is doing is creating the command files to invoke DB2's load utility.

As per the OP's question goes when a bulk loader should be used, well it depends upon your requirements and your processing window.
If you want to capture rejects then you wont be able to use the bulk load utility. You can identify the longest running jobs, and add logic to it to pre-identify the rejects and then build the load ready file which will be blazed into the database using the bulk load.

Regards,
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ascen
Premium Member
Premium Member
Posts: 67
Joined: Wed Apr 12, 2006 6:47 am
Contact:

Post by ascen »

This is the same as I think :)
Post Reply