How to parse a datastage job and fetch all col derivations

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
yabhinav
Participant
Posts: 37
Joined: Wed Mar 05, 2008 11:54 pm
Location: Hyderabad

How to parse a datastage job and fetch all col derivations

Post by yabhinav »

I need to come up with a way to export all the columns used in datstage job into an text file. The details have to be stage wise.

For Example : Suppose i have a job which has 3 stages (Oracle , transformer and sequential file)

My output file shud have

Oracle Tansformer Seq
Emp (varchar) Emp Emp
Empid(Integer) Empid Empid
Salary(Integer) If Salary > 100 then Salary else salary +100 TotSalary

So as given in the above example i need the derivation of each column in each stage of a given

Please let me know if this is possible
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

I'd suspect you'd be into the land of back-end querying... that's to say, reading out of the Datastage repository itself. Failing that, some amalgamation of exporting jobs over command line and then scripting something to strip out the necessary from each entry in the export dump.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Perhaps with an advanced uderstanding of the underlying repository you could engineer something directly from there. The API will get you most of that, but not the derivations. Otherwise, you may want to look into generating a job report and then processing that. There's an icon in the Designer that looks like an IE icon and that will get you one manually, there is also a way to generate the same from the command line from what I recall. The output is HTML but the intermediate XML can be preserved (from what I recall) and then perhaps reprocessed by you into your desired format.

You could also look into exporting the jobs to .dsx or .xml format and then processing that... perhaps via another job.

I would think other people will chime in with ideas as well. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The API call to DSGetLinkMetaData() will return a complete list of column names and metadata for each link in a stage. You would need to use DSOpenJob(), DSGetJobInfo(), DSGetStageInfo() and DSCloseJob() in order to get this to work.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

This as a way to obtain the derivation? Interesting, don't recall seeing that one before. It's not new in the 8.x release, is it?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Craig - you are correct, I didn't read the complete post and thought that the original poster just wanted column names. As you correctly noted, the derivation is not available as part of the link information; and thet DSGetStageInfo() routine call does not have an option to return derivations.

So one would need to delve into the (undocumented) depths of the hashed files or to export the job and then parse the XML or DSX format.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

DSX might be easier. Derivation or ParsedExpression or ParsedDerivation is what to look for. The column name will be a couple lines above that and have Name in front of it.
Mamu Kim
yabhinav
Participant
Posts: 37
Joined: Wed Mar 05, 2008 11:54 pm
Location: Hyderabad

Post by yabhinav »

Thank you all for your suggestions. I've been trying export the job as an xml and parse it, but i guess this is gonna take me some time.

Will update as soon as i come up with something

Regards,
Abhinav
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why do you need to do this? The built-in job report functionality will provide all this and more.

Resist stupid requirements!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
yabhinav
Participant
Posts: 37
Joined: Wed Mar 05, 2008 11:54 pm
Location: Hyderabad

Post by yabhinav »

Here's why we need such a requirment. Our system gets its data from various source like mainframes, db2 and MQs. And we perform a lot of calulations on the incoming feed and then send it to our warehouse, which serves as a reporting platform to other applications. The plan is to have an intermediate system in place that does all these calculations and sends us a feed that we just load into the warehouse.
Now dont ask me why they are doing this!! cos i have no clue.

So we need to come up with all the fileds that are being calculated in the jobs and send them to another team.

That is the reason i requested a solution for such a STUPID requirment!!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What about using the lineage and impact analysis functionality built into the product?
yabhinav
Participant
Posts: 37
Joined: Wed Mar 05, 2008 11:54 pm
Location: Hyderabad

Post by yabhinav »

Is this functionality available on 7.5 version??
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Only using MetaStage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply