Page 1 of 1

Redbrick Bulk Loader

Posted: Sat Jan 15, 2005 12:11 am
by rasi
I am using Redbrick Bulk load(TMU Utility) to update 900,000 records. The source is sequential file and has three columns one is primary key of the target table and other two is the columns needed to be updated.

The existing table has 4,500,000 records and one of the column which is updated has Btree index on the table. The problem now I am facing is this job takes ever to run. This job is running for more than 24 hrs and nothing happened. The tmu log shows the job is running and in the database I can see the command load data in the dst_commands. When I looked at the locks it shows all the other tables which is referenced by this table.

As a work around we tried using the tmu command manually and splitted the 900,000 records into 200,000 chunck. We started the first set of 200,000 records last night and it got finished after 12 hours.

Did anyone faced this type of problem. Is something need to be tunned?....

Posted: Sat Jan 15, 2005 12:41 am
by ray.wurlod
I always prefer to separate the DataStage component from the Red Brick component. This means that you can apportion blame accurately - is it DataStage or Red Brick?

Disable automatic execution of rb_tmu in the Red Brick stage.

Use ExecSH as an after-job subroutine to execute the TMU script instead, and use rb_ptmu rather than rb_tmu - this will use parallel processes to update the indexes on your Red Brick table.

Is the target table segmented? Does each segment use a number of PSUs, or only one? (More is generally better in terms of ease of management.)

Clearly it is the referential integrity that is slowing things here. Is this a one-off initial load, or a regular requirement? Checking RI is - necessarily - a serial (single) process, and you don't have the luxury of OVERRIDE REFCHECK with TMU. If it's a one-off load and you are confident of the referential integrity of the data, and want raw speed, disable (drop) the foreign key constraints and re-enable them afterwards.

Are you using the -i option with rb_(p)tmu to report progress every N rows?

You should be able to load 4.5 million rows well within an hour with only one or two foreign keys - do you have an abnormally large number of foreign keys in the table in question?

There is no way this job should be taking this long. Even 4.5 million UPDATE statements could be executed in an hour or two! Something is wrong somewhere. Are there any blocking locks on one of the referenced tables?

Finally, have you sought help from IBM? Yeah, I know, Saturday.

Posted: Mon Jan 17, 2005 9:56 am
by gpatton
What indexes are on the table ( number and type )?

Sometimes it is faster to drop the indexes, load information into the table and then recreate the indexes.

Posted: Mon Jan 17, 2005 3:18 pm
by ray.wurlod
The point about parallel TMU (rb_ptmu) is that the indexes are updated by separate processes, thereby not slowing down the loading of rows into the main table.

The main bottleneck is usually foreign key constraints (referential integrity). These must be checked by the process loading the main table. If you have squillions of foreign key constraints, this substantially slows the load.