Generating a DS sequence job

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
fedpie
Participant
Posts: 17
Joined: Wed May 28, 2008 1:59 am
Location: sweden

Generating a DS sequence job

Post by fedpie »

Hi all,
I am converting an existing ETL flow, currently expressed as about eight thousand SQL queries and procedures, to Datastage. I found that I can generate the corresponding Datastage jobs relatively easily, dumping a few template jobs to XML and using an XSLT transformation or even a bit of sed and awk actually.

It seems to me though, that doing the same thing in order to generate one or more Datastage sequences is much harder, since those are mostly expressed in JobControlCode rather than straight XML.

My question is, has anybody here attempted to generate Datastage sequence jobs from metadata, or does anybody see any efficient alternative when metadata describing the job dependencies is available, and the number of jobs is so large that manually creating sequences is not acceptable?

Best regards,

F
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

this might be a bit out of the scope of normal DataStage, but the backbone of the Server engine is the DS/BASIC language on top of what used to be the UniVerse DB. This language should allow you to generate the programs in question. It is more of a matter of defining what needs to be done and what is meant by "metadata". Are all the jobs "almost" the same or would job logic also come from the metadata?
fedpie
Participant
Posts: 17
Joined: Wed May 28, 2008 1:59 am
Location: sweden

Post by fedpie »

Hi,

In practice I can build a table of SQL queries or procedure calls, and their execution order, which is the metadata I was thinking of for the purpose of this discussion.

It seems I can use two jobs as templates, one for the queries and one for the procedures. I have done some preliminary tests and apparently there is no problem there, i.e. I can generate the jobs by substituting a dummy SQL query or procedure call etc. in the templates, import the generated jobs into a DS-project and execute them.

Regarding sequences, all I need to do is to create one or more sequences that execute my jobs in a given order until one job fails or the last job has executed with no error.

So, you say there is one more language I could try to use, besides the XML (alt DSX) format, and the Job Control Code used in datastage sequences?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The Job control code is autogenerated but written in DS/Basic. Are the jobs similar enough that you could just write one generic job and pass everything in as parameters, including your SQL?
fedpie
Participant
Posts: 17
Joined: Wed May 28, 2008 1:59 am
Location: sweden

Post by fedpie »

Yes, the jobs are similar enough. The generation of the jobs is a solved problem, though. It's the generation of the sequences that worries me now :-)

ArndW wrote:The Job control code is autogenerated but written in DS/Basic. Are the jobs similar enough that you could just write one generic job and pass everything in as parameters, including your SQL? ...
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you create a minimalistic template sequence, how many changes would each one need? If they are simple, you could write a small program to make those changes on an exported copy (.dsx or .xml as you prefer) that would include renaming the sequence and then importing that changed version.
fedpie
Participant
Posts: 17
Joined: Wed May 28, 2008 1:59 am
Location: sweden

Post by fedpie »

To clarify, I need to create sequences running hundreds or possibly thousands of jobs serially.

I took a look at the exported XML of a simple sequence, and I can indeed see that the structure is simple, that is I see a node with type = "JSJobActivity" per job, plus nodes with type = "JSJobActivityInput" and type = "JSJobActivityOutput". It is clear enough how those function, and it shouldn't be a problem to generate similar nodes for each of my jobs.

The issue is the node with Identifier = "ROOT", which contains a property with name = "JobControlCode", storing somewhat convoluted code, probably the DS/BASIC you refer to. I could find a way to generate such code also, but hopefully somebody else has already solved a similar problem :-)
ArndW wrote:If you create a minimalistic template sequence, how many changes would each one need? If they are simple, you could write a small program to make those changes on an exported copy (.dsx or .xml as you prefer) that would include renaming the sequence and then importing that changed version.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You should think about just writing a DataStage BASIC routine to do this for you. If you have your metadata stored in a file the rough logic would be:

Code: Select all

   {Use OPENSEQ to open the sequential file}
   LOOP UNTIL EndOfFile
      READSEQ MetaData FROM SequentialFilePtr ELSE EndOfFile
      {process metadata to fit into parameters}
      {"JobName" is the name of the job, but add a unique multi-instance name so that it can run in parallel}
      JobHandle = DSAttachJob(JobName,DSJ.NORMAL)
      {Set parameters using DSSetParameter() }
      {Run job using DSRunJob()}
   REPEAT
fedpie
Participant
Posts: 17
Joined: Wed May 28, 2008 1:59 am
Location: sweden

Post by fedpie »

Hi again,
Are you suggesting that I skip the datastage sequences altogether, and instead generate a custom routine to be executed, monitored and maintained by some other means than Director and Designer? It would be preferable if the generated code could be used with the "usual" tools when I am done with this :-)



[quote="ArndW"]You should think about just writing a DataStage BASIC routine to do this for you. If you have your metadata stored in a file the rough logic would be:
[snipped]
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You could write a server job and put the code that does the actual work into the job control code and keep the job empty; this is similar to how job sequence work. It is hard to give real suggestions since I don't know what you need to do, it might be possible to do this with the norma job sequence stages.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Food for thought...

I very rarely use sequences except for 'looping' situations, especially for anything approach hundreds or (shudder) thousands of jobs. Many folks out in the wild have developed some kind of 'job control' code, for us we took the complete solution Ken Bland freely offers on his website (KBA Job Control Utilities) as a base and added to it over the years but it was perfectly serviceable out of the box. Then all you need to control your jobs are a csv file listing them and any dependencies between them, which it will enforce. You can also do things like control the maximum number of jobs it will allow to run simultaneously, amongst other things, so you're not stuck with a serial string a mile long unless that is what you actually need.

All in all, a much better solution IMHO and nothing to maintain there other than job hierarchies and parameter values. Me, I couldn't imagine moving to a new project that didn't have something like that in place. YMMV of course. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
fedpie
Participant
Posts: 17
Joined: Wed May 28, 2008 1:59 am
Location: sweden

Post by fedpie »

Alright guys,

I imagined this project would strain the tool a bit ;^)

Thanks a lot for your input, I feel I can make more informed design decisions now. I'll definitely take a look at Mr. Bland's solution.

Regards,

F
DS_SUPPORT
Premium Member
Premium Member
Posts: 232
Joined: Fri Aug 04, 2006 1:20 am
Location: Bangalore

Post by DS_SUPPORT »

I have generated the Sequence jobs, by writing a DS routine. My input will be list of jobs that needs to be there in the Sequence.

First, i will loop through all of them, and get the required job parameters. and rest of the stuff in the Sequence job will be mostly hardcoded, except the Stage Positions and Link names. If you analyze two or more sequence jobs you will be able to identify the common part between them.

I am not in a position to share the code, but as you already generated the server jobs, it should be easy for you to generate the sequence jobs also.

My routine will generate the dsx file.
Post Reply