idea on implementing process flow (higher level question)

mystuff · Post by **mystuff** » Tue May 01, 2007 2:48 pm

This might look like a lengthy & tough question, but its quite simple, trust me

. I have some idea of implementing the design and would like your thoughts/opinions on it + few questions running through my mind.

ok here we go,

SCENARIO,

I have 30 extract jobs (in sequence -> seq_ext, all 30 in parallel) pulling data from Source Database (1 cluster). Each extract job might at the maximum extract .5 million records.

I need to use the same extract jobs on 9 other clusters (i.e. altogether 10).

I planned to have an enterprise scheduler kick of 10 instances of the sequence - seq_ext (calling the 30 extract jobs on each cluster), simultaneously.

Each instance of seq_ext will create a flag file upon completion (altogether 10 instances will create 10 flag files upon completion)

The enterprise scheduler will keep checking for all the 10 flag files, when ready, it will kick off the main sequence for processsing.

For processing I will have to combine data from all clusters, i.e. Lets say, we have Extract_JobA

Then all files obtained from

Extract_JobA.Instance1,.....Extract_JobA.Instance10

I have to use cat command to combine these files

The things below are hindering me to take the decision. Please try to answer the first question in as muchas possible. I would really appreciate the help.

QUESTIONS

Q1) i) Is it good idea to call all 10 instances at the same time
(
Considering that these are only extract jobs, no hash files, no sorts, no aggregators,
just

Code: Select all

Source -> Transformer -> Sequential File

)
ii) Or shall I stick to 2 or 3 instances at a time.
iii) How will memory/Disk space play a role on deciding this.
iv) Is there any way I could calculate rough estimate on how many instances can be kicked at the same time?

Q2) Is it a good idea to have enterprise scheduler look for those 10 flag files (or) should a wait stage be used in the mains sequence for processing.
Q3) will cat command work well for concatenation 10 - 0.5 million records.

Thanks
Awaiting for replies

DSguru2B · Post by **DSguru2B** » Tue May 01, 2007 2:55 pm

So is your scheduler controlling the process by reading return codes or your job sequence is?
First of all, the creation of flat files at the end of each extract is redundant, IMHO. As the scheduler can be setup to kick off the 11th job (your main sequence) only after the first 10 finish successfully.
In your main sequence, you can add an exececute command stage which will be the first stage which concatenates all these files. cat can handle 10 x 0.5M records, no problem.
Test out the optimal performance by kicking off the jobs simultaneously and see how much the server can handle. Use top, glance etc. to monitor the server load.
My 2 cents.

mystuff · Post by **mystuff** » Wed May 02, 2007 2:35 pm

So is your scheduler controlling the process by reading return codes or your job sequence is?

The Enterprise Scheduler kicks off the scripts (containing dsjob) to initiate all extracts and so it gets the return value from the script. Actually my follow-up question is based on this.

First of all, the creation of flat files at the end of each extract is redundant, IMHO. As the scheduler can be setup to kick off the 11th job (your main sequence) only after the first 10 finish successfully.

If Scheduler kicks off extract jobs on 10 clusters and one of them fails.

What will be the next step to do after that -

a) Should a mail be sent to DataStage Developers by the Enterprise Scheduler Team?
b) After fixing whatever the problem, the DataStage Developers manually run those jobs?
c) Assuming (a) & (b) I am planning to use those 10 redundant files as flags. So the Enterprise scheduler to kickoff 11th job need not concern about the intermediate stages of fixing the problem.

Is this the right approach?

By the way what is IMHO

DSguru2B · Post by **DSguru2B** » Wed May 02, 2007 2:43 pm

If one of the 10 jobs fail, the enterprise scheduler wont go to the 11th step as the 11th job will be dependent upon the successful completion of all 10. Say 9 complete successfully and the 10th one fails. You take two days to fix it and restart from point of failure, the enterprise scheduler whatever it may be, will kick off only the 10th job and then the go to the 11th job. Thats why I said that creation of trigger files is redundant.
If a job fails, you can be buzzed either by a mail or even a call by the operation folks who monitor these scheduled jobs. It all depends upon what 'action to perfom' you specify to them upon failure.
No need to kick off the job manually after a failure, give your scheduling/operations folks a call or email them and tell them to restart from point of failure.
IMHO = In my humble opinion.

mystuff · Post by **mystuff** » Thu May 03, 2007 9:29 am

Thanks a lot - cleared lot of questions.

I guess this would be the last questions through my mind (hopefully

)

a) Will the people responsible for Enterprise Scheduler able to see the echo statements? (it might take long time to ask this question to the concerned ppl, hence posting them over there)

b) I am planning to use

Code: Select all

exit 0 on success
   exit 1 on failure

in unix

as there are bunch of return values from what I am doing i.e. checking jobstatus, resetting the job, then running it.

Is this alright?

c) What does return value 2 of dsjob -run stands for?

DSguru2B · Post by **DSguru2B** » Thu May 03, 2007 9:41 am

a) They should be able to capture the sysout. I know ControlM does.
b) Correct, double check with your scheduling group about the return codes.
c) dsjob -run, if successful will return exit code of 0. 2 means something went wrong in the command itself. Now if you want the jobstatus then you need to stick in -jobstatus.

Code: Select all

dsjob -run -jobstatus <<ProjectName>> <<JobName>>

This will return the job status. 2 in this case means the job finished with warnings.

mystuff · Post by **mystuff** » Fri May 04, 2007 8:54 am

Test out the optimal performance by kicking off the jobs simultaneously and see how much the server can handle. Use top, glance etc. to monitor the server load.

Do we need to monitor the network traffic as well (for the data coming in from the clusters) ?

ray.wurlod · Post by **ray.wurlod** » Fri May 04, 2007 3:37 pm

If you want to re-map the exit status from the dsjob exit status, that's fine - you can do that. If dsjob exit status is 1, exit 0; if dsjob exit status is 2, probably exit 0 also. Otherwise exit 1.