Page 1 of 1

Shared hashed file vs Performance issues

Posted: Fri Jan 11, 2008 4:35 am
by Das
Hi,

Could you give any suggestions on the following issue.

I have a hashed file,Which is referenced in 7 jobs,There is no other dependency exists between these jobs.All the jobs are handling huge amound of data and their exicution time varies from 3hrs to 30hrs depends on the job.

I have two options to run the jobs.Either Parrelel or Sequncial way?Will there be any performance difference?
Which is the better option?
Please suggest..

Posted: Fri Jan 11, 2008 4:45 am
by ArndW
If the hashed file is being referenced, i.e. read-only with no updates, then by all means run in parallel. Each job referencing this file will have it's own memory copy if the file is designated "pre-load to memory", but there might be some benefit if many jobs are reading the file at the same time as the disk blocks and buffers will probably be in memory and not always have to be fetched from disk.

There is a mechanism in DataStage that will allow you to share a memory copy of a hashed file across processes but this involves quite a bit of work to set up and manage.

Posted: Fri Jan 11, 2008 5:15 am
by Das
ArndW wrote:If the hashed file is being referenced, i.e. read-only with no updates, then by all means run in parallel. Each job referencing this file will have it's own memory copy if the file is designated "pre- ...

THANKS A LOT

Posted: Fri Jan 11, 2008 5:17 am
by Das
ArndW wrote:If the hashed file is being referenced, i.e. read-only with no updates, then by all means run in parallel. Each job referencing this file will have it's own memory copy if the file is designated "pre- ...

THANKS A LOT