DataStage Job

shyju · Post by **shyju** » Mon Aug 08, 2005 3:58 am

Hi all!

I designed a DateStage job which is a simple load of data from a sequential file with hash file as a look up and loading it into one of the tables in Redbrick database. While running this job, Though the data gets loaded completely into the table but while viewing the same job in designer by setting the "Show Performance Statistics", The job is actually running. However it gets completed after some time. But the infomation about completion of the job in DataStage designer is quite misleading. Can anyone explain the same?

ogmios · Post by **ogmios** » Mon Aug 08, 2005 5:21 am

Can you supply the actual "end messages" that you see in your job, and what you would actually expect. The original mail is a bit confusing.

Ogmios

ray.wurlod · Post by **ray.wurlod** » Mon Aug 08, 2005 6:28 pm

The row counts are captured at regular intervals while the stage is running. However, the job can not actually finish until rb_tmu (the Red Brick bulk loader) returns an exit status. So there is a period during which rb_tmu is executing in which DataStage is not processing any more rows, but is merely awaiting the exit status from rb_tmu. It's this that you're observing, and it's perfectly normal.

If you want to prove this, change "automatic load" to false. DataStage will finish promptly, but you will need to make some other arrangement (such as an Execute Command activity in a job sequence) for performing the actual load into Red Brick.

You might also compare with the timestamps in the Red Brick activity log for the load.

shyju · Post by **shyju** » Mon Aug 08, 2005 10:17 pm

Thx Ray for the detailed explaination. One more doubt to add on... While using the ODBC stage instead of Red Brick Bulk load, I happen to face the same kind of scenario. Can you please explain this?

ray.wurlod · Post by **ray.wurlod** » Tue Aug 09, 2005 12:43 am

Assuming that you're using 0 rows per transaction, much the same kind of explanation. You send all the rows, but hold off sending a "commit" until the end. It's not until that point that Red Brick can start actually loading rows into tables and updating indices. Your DataStage job waits for the "all OK".

ray.wurlod · Post by **ray.wurlod** » Tue Aug 09, 2005 12:44 am

Best scenario, especially if you have lots of indexing on the Red Brick table, is to use the bulk load stage with automatic load disabled, then use the parallel bulk loader (rb_ptmu) from an after-stage or after-job subroutine. This allows separate processes to load the table and the indexes, in parallel.

DSXchange

DataStage Job

DataStage Job

Re: DataStage Job