ExeCmd always execute SERIALLY within a single Sequence Job

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
eli.nawas_AUS
Premium Member
Premium Member
Posts: 39
Joined: Tue Apr 15, 2014 9:14 am

ExeCmd always execute SERIALLY within a single Sequence Job

Post by eli.nawas_AUS »

Hi
I have a single Sequence job. Within it are five independent Execute Command stages, each calling a shell script, which then calls a Hive (Hadoop) command. At the beginning of this Shell Script I immediately log the system-time; at the end I log the time also. This way I would know when the shell script starts/stops, which also means when Execute Command starts/stops.

The Execute Command stages are independent, not serially placed (no lines connecting them) After the job runs I look at the log. I noticed that these Execute Commands always execute SERIALLY, not parallel as I expected.

I find a work around in that If I create 5 Sequence Jobs, and place each Execute Command in each Sequence job, then the Sequence Jobs execute in PARALLEL; hence the Execute Commands execute in PARALLEL.

Is this normal behavior? Even with the above work-around works (embedding each Execute Command in each Sequence job), this is hardly satisfactory. I have 50+ Execute Commands to do. That means I need 50+ Sequence Jobs? Some jobs have 200+ tables, each require a hive connection, ...

I really just want 1 (or just a few) sequence jobs to do all these Execute Commands.

Is this possible? Are there any options somewhere that I miss?
(DataStage Designer V 9.1.2.0)

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: ExeCmd always execute SERIALLY within a single Sequence

Post by chulett »

eli.nawas_AUS wrote:Is this normal behavior?
Yes.

Perhaps a scripted approach is in order for these 50+ commands?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Activities in a job sequence are executed sequentially.

You will see that if you investigate the generated BASIC code.

A job activity starts a job without waiting for it to finish. That is why you obtain the parallel execution of your commands by placing them inside a job sequence and executing them via a job activity. They run concurrently because your command execution time is longer than the small amount of time that is required to start each job sequence that contains a command.

In contrast, an execute command activity waits for the command to finish before it returns control to its controlling job sequence. That's why they do not start at the same time even though you have no trigger links between them.

Run your command in the background and the Execute Command activity will return control to the sequence immediately.

Mike
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

To run a command in the background, add an ampersand after the query. To submit the command asynchronously (that is, to make sure that it doesn't die because it can't contact its parent), use the nohup command. For example:

Code: Select all

nohup echo "Dummy Heading" > #jpFilePath# &
Last edited by ray.wurlod on Mon Mar 28, 2016 11:12 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eli.nawas_AUS
Premium Member
Premium Member
Posts: 39
Joined: Tue Apr 15, 2014 9:14 am

Post by eli.nawas_AUS »

I didn't think about the background option for the unix command.


Thanks for all your helps
eli.nawas_AUS
Premium Member
Premium Member
Posts: 39
Joined: Tue Apr 15, 2014 9:14 am

Post by eli.nawas_AUS »

Hi

I added the "nohup ... &" and it returns immediately.

I have a new issue though: I needed it to wait for all these process to completed before the next step.

Solution 1: I added the sequencer2 after these ExeCmds, like this (ignore the dots. They are there to align these words):

......................................... --> ExeCmd1 --+
..................................................................|
uservariable -> sequencer1 --> ExeCmd2 --+--> sequencer2(all)
..................................................................|
......................................... --> ExeCmd3 --+

But this does not work as all ExeCmd return TRUE, hence sequencer2 continues on. It does not have any idea to wait for those bkg processes.


Solution 2: I added another ExeCmd with the wait command inside, like this:

......................................... --> ExeCmd1 --+
.................................................................|
uservariable -> sequencer1 --> ExeCmd2 --+--> sequencer2 --> ExeCmd3
..................................................................|
......................................... --> ExeCmd3 --+

ExeCmd3 just have 1 "wait" command inside.
I was thinking that the wait command should work, because without any pid "wait" will wait for all child processes

But this does not work either, because the wait is a brand new process. Hence it does not have any child processes. So it finishes immediately, not waiting for those bkg processes from ExeCmd1..3.


What can I use to force sequence2 or ExeCmd3 to wait for all subprocesses to complete?


Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Investigate the jobs command to determine what background process(es) you have running. Keep executing this till your list is exhausted.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eli.nawas_AUS
Premium Member
Premium Member
Posts: 39
Joined: Tue Apr 15, 2014 9:14 am

Post by eli.nawas_AUS »

Hi

This is my understanding. Pls correct as needed:

- This jobs command must be run within an ExeCmd (because it is a Unix command).

- Since all bkg processes are child processes, that means we must find the pid of the parent process (1st sequence job)

- Then, how do you get the pid of the parent job. Does DS provide such function?


Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can get the PID in a number of ways, for example enabling APT_PM_SHOW_PIDS. But this gives the PIDs of the player processes; their parents are the section leader processes, and their parent is either the conductor process or its rsh agent. And only the parent process of that will be the PID of the controlling sequence.

It's probably easier to use the UNIX command ps -ef with an appropriate grep filter piped into a cut command to retrive the PPID.

Sorry to be so generic, but I don't really have the time to devote to solving your particular problem right now.

I would probably prefer to use a DataStage routine here, in which information about the controlling sequence is readily obtained using DataStage API functions.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Another thought. When you issue a nohup command, the PID of that process is reported to stdout, and therefore could be captured via the $CommandOutput activity variable of the Execute Command activity.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply