Page 1 of 1

Shared containers in parallel -- multiple instances?

Posted: Wed May 03, 2006 2:31 pm
by RodBarnes
If I create a shared container and then use that in four separate jobs, can those jobs run in parallel without conflicting one another? Asked another way, does each job get its own instance of the shared container?

I've skimmed through the posts on "shared containers" and haven't found a definitive answer to this question. I know that there is value in being able to reuse the container (i.e., not having to rewrite those stages multiple times) but couldn't determine if multiple instances can be run in parallel.

Thanks.

Posted: Wed May 03, 2006 2:53 pm
by chulett
Can they run in parallel? Yes. Without conflicting one another? Depends on what you are doing in the container.

It's just reusable code that gets compiled into your job. It's more about whether your jobs can run concurrently without issue, not just what is in the shared container.

Posted: Wed May 03, 2006 3:04 pm
by RodBarnes
Can they run in parallel? Yes Without conflicting one another? Depends on what you are doing in the container.
Fair enough. My question is more along the general line of "does each instance get its own thread and/or variables? Or are the variables shared?"

To be more specific: I have an update job and an insert job that each read from the same sequential file, use the same lookup hash-files, and update or insert into the same Oracle table. The only real difference is at the beginning, the insert job passes only new records to be processed while the update job passes only existing records. The update job has an additional stage to compare the CRC value to see if anything changed in the curent record compared to the existing one in the table.

It seems reasonable to create a shared container that does all this and use that same container in each of the insert and update jobs. So, the insert job would have an input from the sequential file to a test for new record and then into the container. It would have an output to the table.

The update job woud have an input from the sequentil file to a test for existing records and then into the container. It would have and output to a test for the CRC and then on to update the table for changed records.

Ok, given that, would the fact that these two jobs run at the same time cause issues? It seems like they could since the stages within the container would only be doing lookups to transform the record.

Long, I know, but trying to be clear. :)

Posted: Wed May 03, 2006 3:14 pm
by chulett
RodBarnes wrote:My question is more along the general line of "does each instance get its own thread and/or variables? Or are the variables shared?"
Ah... not what you asked. :wink:

You are over thinking it. Think of what the two jobs do without regards to the fact that they have a 'shared container' in them. Can you run them at the same time without issue? The fact that some of the code is encapsulated into a shared container plays no role here. Other than to make your job as a developer easier, that is.

As previously noted, they are just reusable code that is compiled into your job. Nothing else is magical about them. So the end result, the job code that ends up actually running, is no different than if you had coded those stages directly into your job.

Does that help?

Posted: Wed May 03, 2006 3:32 pm
by RodBarnes
So a shared container is really just like including code from a shared library; e.g., a set of code that gets compiled into the overall module. You can use that code (and run it) in as many modules as you choose because each one gets its own copy at compile time.

I just wasn't clear whether the container ended up compiled into its own separate component somehow (like on the first compile) and each module then used that same instance. Its just my experience as a Windows SW eng and dealing with shared DLLs and such that made me wonder if there was something similar going on here.

Thanks.

Posted: Wed May 03, 2006 9:03 pm
by ray.wurlod
DataStage server uses multiple processes rather than multiple threads, with very few exceptions (for example the sort engine is multi-threaded). Since separate jobs run in separate process you can correctly deduce that their local variables are independent of those in other jobs.

Posted: Thu May 04, 2006 12:18 am
by sb_akarmarkar
RodBarnes wrote: To be more specific: I have an update job and an insert job that each read from the same sequential file, use the same lookup hash-files, and update or insert into the same Oracle table. The only real difference is at the beginning, the insert job passes only new records to be processed while the update job passes only existing records. The update job has an additional stage to compare the CRC value to see if anything changed in the curent record compared to the existing one in the table.
I thinks Insert & Update you are trying to do for Oracle table in shared container ... Cannot be possible in parallel for jobs... :? If you try to run it in sequence then it may abort or it may hang.... :)


Thanks,
Anupam

Posted: Thu May 04, 2006 12:37 am
by ray.wurlod
... but here we're talking about server jobs, so your point is moot.