When to use bulk Load?

shamshad · Post by **shamshad** » Thu Jun 29, 2006 9:16 pm

Our datamart ETL jobs processes about 300,000 to 900,000 rows everyday that comes as source in text file. The first job reads the text file and populate a staging DB2 UDB table and from that staging table all the dimensions are updated and facts are populated. There are about 8-9 dimensions and 3 facts tables. It takes about 3-4 hours for the whole job to complete.

I am wondering if we should use bulk loading when populating some of the tables that take long time to load. Currently, there are no bulk load but only inserts.

My question is when is the best scenario to do bulk loading? I mean, if we have to load 3-400,000 rows in fact table, should we go for bulk loading?

Do you guys have any standard limit that use bulk load only when the row count is 1 million and above....etc... I would greatly appreciate your feedback

ray.wurlod · Post by **ray.wurlod** » Thu Jun 29, 2006 9:19 pm

Anything over about 1000 rows is likely to benefit.

It may be possible to get even better speed, by not using the "bulk load" stage at all but, rather, using a Sequential File stage to write the data file(s), and invoking the bulk loader (using a pre-written, tuned, control file) either in an after-job subroutine or in an Execute Command activity in the controlling job sequence.

ascen · Post by **ascen** » Tue Jul 04, 2006 7:34 am

ray.wurlod wrote:and invoking the bulk loader (using a pre-written, tuned, control file) either in an after-job subroutine or in an Execute Command activity in the controlling job sequence.

Would DS log any error occured during such command?
Or does it just send some file to rdbms and forget about it?

Regards,
Piotrek

chulett · Post by **chulett** » Tue Jul 04, 2006 8:36 am

That would depend on you and how you wrote the invoking routine.

DSguru2B · Post by **DSguru2B** » Tue Jul 04, 2006 9:55 am

The bulk loader works on its own. It has a message file that it writes to. And it writes every step, even its commits. That file is very usefull if any problem occurs. The DS log doesnt help much has the only work it is doing is creating the command files to invoke DB2's load utility.

As per the OP's question goes when a bulk loader should be used, well it depends upon your requirements and your processing window.
If you want to capture rejects then you wont be able to use the bulk load utility. You can identify the longest running jobs, and add logic to it to pre-identify the rejects and then build the load ready file which will be blazed into the database using the bulk load.

Regards,

ascen · Post by **ascen** » Tue Jul 04, 2006 3:52 pm

This is the same as I think