Page 1 of 1

Generating a DS sequence job

Posted: Wed Nov 11, 2009 3:57 am
by fedpie
Hi all,
I am converting an existing ETL flow, currently expressed as about eight thousand SQL queries and procedures, to Datastage. I found that I can generate the corresponding Datastage jobs relatively easily, dumping a few template jobs to XML and using an XSLT transformation or even a bit of sed and awk actually.

It seems to me though, that doing the same thing in order to generate one or more Datastage sequences is much harder, since those are mostly expressed in JobControlCode rather than straight XML.

My question is, has anybody here attempted to generate Datastage sequence jobs from metadata, or does anybody see any efficient alternative when metadata describing the job dependencies is available, and the number of jobs is so large that manually creating sequences is not acceptable?

Best regards,

F

Posted: Wed Nov 11, 2009 5:16 am
by ArndW
this might be a bit out of the scope of normal DataStage, but the backbone of the Server engine is the DS/BASIC language on top of what used to be the UniVerse DB. This language should allow you to generate the programs in question. It is more of a matter of defining what needs to be done and what is meant by "metadata". Are all the jobs "almost" the same or would job logic also come from the metadata?

Posted: Wed Nov 11, 2009 5:47 am
by fedpie
Hi,

In practice I can build a table of SQL queries or procedure calls, and their execution order, which is the metadata I was thinking of for the purpose of this discussion.

It seems I can use two jobs as templates, one for the queries and one for the procedures. I have done some preliminary tests and apparently there is no problem there, i.e. I can generate the jobs by substituting a dummy SQL query or procedure call etc. in the templates, import the generated jobs into a DS-project and execute them.

Regarding sequences, all I need to do is to create one or more sequences that execute my jobs in a given order until one job fails or the last job has executed with no error.

So, you say there is one more language I could try to use, besides the XML (alt DSX) format, and the Job Control Code used in datastage sequences?

Posted: Wed Nov 11, 2009 5:50 am
by ArndW
The Job control code is autogenerated but written in DS/Basic. Are the jobs similar enough that you could just write one generic job and pass everything in as parameters, including your SQL?

Posted: Wed Nov 11, 2009 5:56 am
by fedpie
Yes, the jobs are similar enough. The generation of the jobs is a solved problem, though. It's the generation of the sequences that worries me now :-)

ArndW wrote:The Job control code is autogenerated but written in DS/Basic. Are the jobs similar enough that you could just write one generic job and pass everything in as parameters, including your SQL? ...

Posted: Wed Nov 11, 2009 6:03 am
by ArndW
If you create a minimalistic template sequence, how many changes would each one need? If they are simple, you could write a small program to make those changes on an exported copy (.dsx or .xml as you prefer) that would include renaming the sequence and then importing that changed version.

Posted: Wed Nov 11, 2009 6:25 am
by fedpie
To clarify, I need to create sequences running hundreds or possibly thousands of jobs serially.

I took a look at the exported XML of a simple sequence, and I can indeed see that the structure is simple, that is I see a node with type = "JSJobActivity" per job, plus nodes with type = "JSJobActivityInput" and type = "JSJobActivityOutput". It is clear enough how those function, and it shouldn't be a problem to generate similar nodes for each of my jobs.

The issue is the node with Identifier = "ROOT", which contains a property with name = "JobControlCode", storing somewhat convoluted code, probably the DS/BASIC you refer to. I could find a way to generate such code also, but hopefully somebody else has already solved a similar problem :-)
ArndW wrote:If you create a minimalistic template sequence, how many changes would each one need? If they are simple, you could write a small program to make those changes on an exported copy (.dsx or .xml as you prefer) that would include renaming the sequence and then importing that changed version.

Posted: Wed Nov 11, 2009 7:12 am
by ArndW
You should think about just writing a DataStage BASIC routine to do this for you. If you have your metadata stored in a file the rough logic would be:

Code: Select all

   {Use OPENSEQ to open the sequential file}
   LOOP UNTIL EndOfFile
      READSEQ MetaData FROM SequentialFilePtr ELSE EndOfFile
      {process metadata to fit into parameters}
      {"JobName" is the name of the job, but add a unique multi-instance name so that it can run in parallel}
      JobHandle = DSAttachJob(JobName,DSJ.NORMAL)
      {Set parameters using DSSetParameter() }
      {Run job using DSRunJob()}
   REPEAT

Posted: Wed Nov 11, 2009 7:35 am
by fedpie
Hi again,
Are you suggesting that I skip the datastage sequences altogether, and instead generate a custom routine to be executed, monitored and maintained by some other means than Director and Designer? It would be preferable if the generated code could be used with the "usual" tools when I am done with this :-)



[quote="ArndW"]You should think about just writing a DataStage BASIC routine to do this for you. If you have your metadata stored in a file the rough logic would be:
[snipped]

Posted: Wed Nov 11, 2009 7:39 am
by ArndW
You could write a server job and put the code that does the actual work into the job control code and keep the job empty; this is similar to how job sequence work. It is hard to give real suggestions since I don't know what you need to do, it might be possible to do this with the norma job sequence stages.

Posted: Wed Nov 11, 2009 7:47 am
by chulett
Food for thought...

I very rarely use sequences except for 'looping' situations, especially for anything approach hundreds or (shudder) thousands of jobs. Many folks out in the wild have developed some kind of 'job control' code, for us we took the complete solution Ken Bland freely offers on his website (KBA Job Control Utilities) as a base and added to it over the years but it was perfectly serviceable out of the box. Then all you need to control your jobs are a csv file listing them and any dependencies between them, which it will enforce. You can also do things like control the maximum number of jobs it will allow to run simultaneously, amongst other things, so you're not stuck with a serial string a mile long unless that is what you actually need.

All in all, a much better solution IMHO and nothing to maintain there other than job hierarchies and parameter values. Me, I couldn't imagine moving to a new project that didn't have something like that in place. YMMV of course. :wink:

Posted: Wed Nov 11, 2009 7:57 am
by fedpie
Alright guys,

I imagined this project would strain the tool a bit ;^)

Thanks a lot for your input, I feel I can make more informed design decisions now. I'll definitely take a look at Mr. Bland's solution.

Regards,

F

Posted: Thu Nov 12, 2009 1:33 am
by DS_SUPPORT
I have generated the Sequence jobs, by writing a DS routine. My input will be list of jobs that needs to be there in the Sequence.

First, i will loop through all of them, and get the required job parameters. and rest of the stuff in the Sequence job will be mostly hardcoded, except the Stage Positions and Link names. If you analyze two or more sequence jobs you will be able to identify the common part between them.

I am not in a position to share the code, but as you already generated the server jobs, it should be easy for you to generate the sequence jobs also.

My routine will generate the dsx file.