Max number of Hashed files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Max number of Hashed files

Post by gateleys »

Hi,
I have a few questions with respect to references in Server jobs.
1. Is there any limit to the number to hashed files used for references/lookups in a single job?
2. In case there isn't, then what about performance, would it be better to have the job split into a number of jobs so that the lookups (assuming there are many) are spread among them?

Thanks in advance.
gateleys
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There is always going to be a limit somewhere, but I'm not aware of any limits that you will hit when designing your job - if they fit on the canvas they'll compile and run.

If you have Transformer - Transformer links in your jobs, activate interprocess buffering or put in IPC stages, and limit yourself to a couple of lookups per transform then you should do a good job of distributing the load on a multi-cpu system. Splitting this load across several jobs would have the same effect on performance; and it might make it a bit easier to maintain in the long haul instead of having one monster job.
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

Maybe there is somewhere a limit, but before you reach it you won't understand your job.

While producing your job you won't be perfect at the beginning. The more complicate your job, the more points to check for eventual errors.

So construct due to understanding simple and more jobs. It will help you now and later.
Wolfgang Hürter
Amsterdam
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Re: Max number of Hashed files

Post by Sreenivasulu »

Hi,


Regarding question 2
If you are using different references then splitting the job would not help much. I have a job having 9 lookups and 9 transformers in a single job


Regards
Sreeni
gateleys wrote:Hi,


I have a few questions with respect to references in Server jobs.
1. Is there any limit to the number to hashed files used for references/lookups in a single job?
2. In case there isn't, then what about performance, would it be better to have the job split into a number of jobs so that the lookups (assuming there are many) are spread among them?

Thanks in advance.
gateleys
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Thanks guys. Here, I was talking about having over 20 lookups in a single Transformer.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Break it up into 4 or 5 lookups per transformer. This is assuming you have a multi-cpu system; otherwise none of this makes an effective difference.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

Also, as suggested by Ray, it will certainly help designing the job using multiple transformers and sandwiching IPC stages in between these active stages to most utilize the capabilities of a multi-CPU server box.

Thanks,
Naveen.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

naveendronavalli wrote:Also, as suggested by Ray, it will certainly help designing the job using multiple transformers and sandwiching IPC stages in between these active stages to most utilize the capabilities of a multi-CPU server box.

Thanks,
Naveen.
Hi Naveen,
When using the IPC stage, how do I determine the optimal size of buffer for my job. I have gone through the server guide, and also used this stage a number of times before for similar purpose. However, I have always resorted to using the default of 128K (each for Read and Write). Under what circumstances would a bigger/smaller buffer size give me better performance? Also, are 'enabling in-process row buffering' OR using the IPC stage---one and the same thing?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You should enable inter-process row buffering; that is almost the same as using an explicit IPC stage. The buffer size does not normally affect the speed and shouldn't be changed. The buffer size should be set so that it is adequately large for enough rows of data to overcome temporary differences in speed.

If you were to look at the data flow in terms of water and the buffer as a bathtub. If your water flow filling the tub is not at a constant flow or the water doesn't draining at a constant speed then the bathtub is there to keep everything moving - so that if one side stops or blocks it doesn't (immediately) affect the other side. As long as the bathtub holds enough water to let at temporary slowdown in draining not fill it or a temporary slowdown of filling not empty the tub it doesn't matter how large it is - making it the size of a swimming pool won't make anything go faster.

128Kb holds a lot of data. Even if your row size were 1Kb you could still buffer 128 rows - more than enough to buffer out temporary speed differences. In almost all practical applications one side of the buffer will always be faster than the other, so the buffer will almost always be 100% full or 100% empty and the slower process will always have something to do while the faster process will spend a lot of time waiting because the buffer is not ready for it. Usually a buffer of just a couple of rows of data is sufficiently large, so the default of 128Kb is almost ridiculously oversized.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Beautiful answer. Thanks.

gateleys
Post Reply