Our datamart ETL jobs processes about 300,000 to 900,000 rows everyday that comes as source in text file. The first job reads the text file and populate a staging DB2 UDB table and from that staging table all the dimensions are updated and facts are populated. There are about 8-9 dimensions and 3 facts tables. It takes about 3-4 hours for the whole job to complete.
I am wondering if we should use bulk loading when populating some of the tables that take long time to load. Currently, there are no bulk load but only inserts.
My question is when is the best scenario to do bulk loading? I mean, if we have to load 3-400,000 rows in fact table, should we go for bulk loading?
Do you guys have any standard limit that use bulk load only when the row count is 1 million and above....etc... I would greatly appreciate your feedback
When to use bulk Load?
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Anything over about 1000 rows is likely to benefit.
It may be possible to get even better speed, by not using the "bulk load" stage at all but, rather, using a Sequential File stage to write the data file(s), and invoking the bulk loader (using a pre-written, tuned, control file) either in an after-job subroutine or in an Execute Command activity in the controlling job sequence.
It may be possible to get even better speed, by not using the "bulk load" stage at all but, rather, using a Sequential File stage to write the data file(s), and invoking the bulk loader (using a pre-written, tuned, control file) either in an after-job subroutine or in an Execute Command activity in the controlling job sequence.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Would DS log any error occured during such command?ray.wurlod wrote:and invoking the bulk loader (using a pre-written, tuned, control file) either in an after-job subroutine or in an Execute Command activity in the controlling job sequence.
Or does it just send some file to rdbms and forget about it?
Regards,
Piotrek
The bulk loader works on its own. It has a message file that it writes to. And it writes every step, even its commits. That file is very usefull if any problem occurs. The DS log doesnt help much has the only work it is doing is creating the command files to invoke DB2's load utility.
As per the OP's question goes when a bulk loader should be used, well it depends upon your requirements and your processing window.
If you want to capture rejects then you wont be able to use the bulk load utility. You can identify the longest running jobs, and add logic to it to pre-identify the rejects and then build the load ready file which will be blazed into the database using the bulk load.
Regards,
As per the OP's question goes when a bulk loader should be used, well it depends upon your requirements and your processing window.
If you want to capture rejects then you wont be able to use the bulk load utility. You can identify the longest running jobs, and add logic to it to pre-identify the rejects and then build the load ready file which will be blazed into the database using the bulk load.
Regards,
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.