Speed problems in Datastage

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Speed problems in Datastage

Post by admin »

This is a topic for an orphaned message.
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Hi Ken,

Im not saying that DataStage uses multi-threading. It does though support multi-processing. I was just explaining how DataStage works when jobs are designed with independant active stages.

Ive seen customers setup their projects so that they have multiple jobs runing in parallel from a controlling job. This has the same effect as a job with multiple independant active stages and multiple independant input streams, although the former lends itself more to flexibility and maintainability.
There will be a little more overhead in starting the jobs but, this would be negligible.

In my reply I did not suggest that one should write super hugh jobs...
the "divide and conquer principle" applies to DataStage too.

Again, let me iterate... DataStage does not use multi-threading, not yet
anyway. :)
It is a multi-processing environment by nature thanks to the underlying
engine.

When trying to speed up a job (or groups of jobs), one needs to assess where the bottlenecks are and address those issues accordingly. Re-designing jobs will help in most cases... because "theres more than one way to do it".

Regards,
Anthony

--- Ken_2_Bland@sbphrd.com wrote:
> Anthony: If you design a job that has multiple completely independent
> streams within your job design as you state, you will get independent
> processes within your OS. This is not the same as multi-threaded! If
> you design a job with 14 independent streams, you should break it out
> into 14 separate, smaller, modular jobs. Then you have the
> opportunity to load balance your jobs according to your hardware. If
> you build super huge jobs, with independent streams, you have done
> what programmers are almost admonished for doing: writing super huge
> programs!
>
> You can have "speed" problems if your all in one job tries to run,
> because it is overwhelming the resources in that one job. Broken down
> into multiple jobs will allow you to execute within your limits.
>
> DataStage is single-threaded WITHIN A STREAM. This is because as you
> sequentially process rows of source data, and a row has to completely
> traverse from the source passive stage to a target passive stage
> before you process the next row. This is important because the first
> row may be modified by the second row, so you cant multi-thread
> activity when there are predecessor-successor relationships in the
> data.
>
> Thanks,
> -Ken


http://travel.yahoo.com.au - Yahoo! Travel
- Got Itchy feet? Get inspired!
Locked