Redbrick Bulk Loader

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Redbrick Bulk Loader

Post by rasi »

I am using Redbrick Bulk load(TMU Utility) to update 900,000 records. The source is sequential file and has three columns one is primary key of the target table and other two is the columns needed to be updated.

The existing table has 4,500,000 records and one of the column which is updated has Btree index on the table. The problem now I am facing is this job takes ever to run. This job is running for more than 24 hrs and nothing happened. The tmu log shows the job is running and in the database I can see the command load data in the dst_commands. When I looked at the locks it shows all the other tables which is referenced by this table.

As a work around we tried using the tmu command manually and splitted the 900,000 records into 200,000 chunck. We started the first set of 200,000 records last night and it got finished after 12 hours.

Did anyone faced this type of problem. Is something need to be tunned?....
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I always prefer to separate the DataStage component from the Red Brick component. This means that you can apportion blame accurately - is it DataStage or Red Brick?

Disable automatic execution of rb_tmu in the Red Brick stage.

Use ExecSH as an after-job subroutine to execute the TMU script instead, and use rb_ptmu rather than rb_tmu - this will use parallel processes to update the indexes on your Red Brick table.

Is the target table segmented? Does each segment use a number of PSUs, or only one? (More is generally better in terms of ease of management.)

Clearly it is the referential integrity that is slowing things here. Is this a one-off initial load, or a regular requirement? Checking RI is - necessarily - a serial (single) process, and you don't have the luxury of OVERRIDE REFCHECK with TMU. If it's a one-off load and you are confident of the referential integrity of the data, and want raw speed, disable (drop) the foreign key constraints and re-enable them afterwards.

Are you using the -i option with rb_(p)tmu to report progress every N rows?

You should be able to load 4.5 million rows well within an hour with only one or two foreign keys - do you have an abnormally large number of foreign keys in the table in question?

There is no way this job should be taking this long. Even 4.5 million UPDATE statements could be executed in an hour or two! Something is wrong somewhere. Are there any blocking locks on one of the referenced tables?

Finally, have you sought help from IBM? Yeah, I know, Saturday.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gpatton
Premium Member
Premium Member
Posts: 47
Joined: Mon Jan 05, 2004 8:21 am

Post by gpatton »

What indexes are on the table ( number and type )?

Sometimes it is faster to drop the indexes, load information into the table and then recreate the indexes.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The point about parallel TMU (rb_ptmu) is that the indexes are updated by separate processes, thereby not slowing down the loading of rows into the main table.

The main bottleneck is usually foreign key constraints (referential integrity). These must be checked by the process loading the main table. If you have squillions of foreign key constraints, this substantially slows the load.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply