Two utilities (yes, free ones) that you might like

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Two utilities (yes, free ones) that you might like

Post by peternolan9 »

Hi All,
(By way of contributing to the group...)

In the world of unix/win based systems and ETL (and to some extent DataStage) I've been frustrated by the lack of control schedulers give me. (Just call me an ex-JCL person where you have 110% control over what is happening and restarts.)

As ever, if you want something done right you must do it yourself.. ;-)

So, I have written two new utilitles that might be of interest to DS users. (I have published quite a few other utilities and even some ETL software but they are 'free/low cost alternatives to ETL tools' so they have not been applicable to posting here. These two new utilties were actually requested by some of my customers.)

1. A utility to submit a DS job from a win2000 machine to any DS server running anywhere that the win2000 machine can see. (which is not so special until you read about utility 2 which is....)

2. A utility that manages schedules of batches for any commands that run on a win2000 (free) or unix (not free because I must give you the source code). Of course, one of the commands that it can run is the command to run a DS job on any DS server....;-)

The scheduler allows you to define processes, then process groups, then batches. The multiple process groups in a batch can run in parallel, and the processes within process group run sequentially. The scheduler has all the normal things you need in a scheduler as well as managing dependencies intelligently.

Why is this of interest to a person with DS?

Say you have a 4 processor windows machine. (Just to keep it all free.)

Lets say you have 80 staging area tables, 40 dimensions and 20 fact tables, or numbers near those. It could be lots more.

What you can now do is submit a batch that will initially start 4 process groups to perform data transfer for source to staging. All 4 run at once and are then brought to completion. You might run 20 jobs per process group or something like that to balance the load as effectively as possible.

You might then start another batch (or set of process groups) that process the dimension tables, again, parallelising as much as possible for maximum throughput. You would bring them all to completion as well.

Then you might start another batch (or set of process groups) to process all the fact tables.

By creating the dependencies between the process groups based on known computing requirements of the processes in the groups you can balance out the load so that there is less time spent waiting for those straggling processes to complete.

So what? You can do all this in DS control language if you feel like writing it all.....or mostly anyway because it's harder to establish cross job dependencies.....

So, now for the 'BIG DEAL'.

When a process crashes, as they do, all you do is fix whatever the problem was and go into the table that defines the batches and type 'restart'.

Yes, just type 'restart'.

The scheduler will see the restart command and figure out where the whole set of batches was up to and restart the failed job at the beginning. It will then progress through all the batches/process groups according to dependencies.

No more editting DS control jobs to restart the batch processing after a failure. (Which has been something of the source of errors for projects I have worked on. A machine might crash but it takes a human to really stuff it up!!! ;-) )

(And the scheduler will have only halted those processes, process groups and batches that were dependent on the failed process. Anything not dependent on the failed process would have continued to run.)

So, for DS users who would like a better scheduler than the (current) DS scheduler you might like to check out the following link:

www.peternolan.com/pr26.htm

And because I know people here will want to read the code to see if it actually works all the source code for win2000 is published and you can get to it via this link. In fact the link is for the source code for my 'free' utilities.

www.peternolan.com/pr27.htm

Any requests/idea for improvements most welcome, the 'freeware' gets improved on a 'time available' basis.

Hope you like these two utilities.
Best Regards
Peter Nolan
www.peternolan.com
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Very interesting, Peter! I'll have to check it out when I get a chance and see how #2 compares to Uncle Ken's Magnum Opus, which we are currently using with great success. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Not cool to jump from Full Wurlod to Uncle Ken. Please don't give me a nickname.
Mamu Kim
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Two utilities (yes, free ones) that you might like

Post by peternolan9 »

chulett wrote:Very interesting, Peter! I'll have to check it out when I get a chance and see how #2 compares to Uncle Ken's Magnum Opus, which we are currently using with great success. :wink:
I'd be interested to hear of anyone else who has done something better with scheduling and restarting DS jobs...

I know a company in OZ that has written a suite of code to help in this area (they can name themselves if they like, they watch this list). But they use their DS IP to differentiate themselves from other groups so their code is not publicly available.

In the end, I wrote the scheduler for my own software, and I wrote the DS call interface as part of a customer project and then created a utility out of it because it was so easy.....

I'd be interested to see any other C++ code anyone has used to call DS. In my version there seems to be some strange cases where a failed job still returns a 0 return code and I'm still looking for how that happens...

Anyway, hope it's useful....

I talked with Tom Nel about these utilities (my DS guru on my last client) and after I demoed it all to him he told me it was way ahead of the GUI interface for scheduling jobs and creating dependencies...he was going to look at whether it should be used on that client.....I have to test the scheduler on unix first for that project though.
Best Regards
Peter Nolan
www.peternolan.com
Post Reply