Page 1 of 1

Multiple Jobs Reading a single dataset concurrently

Posted: Wed Jan 26, 2011 11:23 pm
by Nagin
Hi,
I am working on a job design which looks like below.
Job 1 - loads dataset A, Job 2- Reads dataset A and does a look up against input data 1, Job 3 - Reads dataset A and does a look up against input data 2.

I am planning to run Job1 first and then run Job 2 and Job3 parallely.

My question is Can Job 1 and Job2 read the DataSet created in Job 1 parallely at the same time without any problem?

If we can do parallel reads from DataSets, Is there a limit on how many jobs can read parallely from a single DataSet?

Thanks,
Nagin.

Re: Multiple Jobs Reading a single dataset concurrently

Posted: Thu Jan 27, 2011 1:05 am
by myukassign
why don't you give a try.

It should be able to read a dataset simulationusly by many process.

Posted: Thu Jan 27, 2011 4:50 am
by ray.wurlod
No, but Job2 and Job3 can concurrently read the Data Set created by Job1. There's no limit until you run out of memory - each loads the Data Set into virtual memory when it's on a reference input link to a Lookup stage.

Posted: Thu Jan 27, 2011 4:24 pm
by Nagin
ray.wurlod wrote:No, but Job2 and Job3 can concurrently read the Data Set created by Job1. There's no limit until you run out of memory - each loads the Data Set into virtual memory when it's on a reference input link to a Lookup stage.
What happens if it is a Mege or Join stage I am using for looking up? In this case Data Set won't be loaded into virtual memory right?

Posted: Thu Jan 27, 2011 8:04 pm
by ray.wurlod
Wrong. Separate images are loaded for each link. So if you have three links referring to the same Data Set, you get three copies of it in memory.

Note that using shared memory for Entire partitioning in an SMP architecture applies to each reference link - you still get one copy (in shared memory this time) per link.

Posted: Thu Jan 27, 2011 10:30 pm
by Nagin
ray.wurlod wrote:Wrong. Separate images are loaded for each link. So if you have three links referring to the same Data Set, you get three copies of it in memory.

Note that using shared memory for Entire partitioning in an SMP architecture applies to each reference link - you still get one copy (in shared memory this time) per link.
Sorry Ray, We have version 8 parallel edition. I don't know how to move the post to parallel edition. Does the same answer apply to parallel edition as well?

Posted: Thu Jan 27, 2011 11:38 pm
by ray.wurlod
That answer IS for parallel jobs. Most of the stage types mentioned in my answer only exist in parallel jobs.

Posted: Fri Jan 28, 2011 8:07 am
by chulett
Nagin wrote: Sorry Ray, We have version 8 parallel edition. I don't know how to move the post to parallel edition.
I "moved" it... since it was already in the right forum, all that was needed was an edit of the original post and (fyi) you can always edit your own posts.