How to parse a datastage job and fetch all col derivations

yabhinav · Post by **yabhinav** » Mon Jun 08, 2009 6:21 am

I need to come up with a way to export all the columns used in datstage job into an text file. The details have to be stage wise.

For Example : Suppose i have a job which has 3 stages (Oracle , transformer and sequential file)

My output file shud have

Oracle Tansformer Seq
Emp (varchar) Emp Emp
Empid(Integer) Empid Empid
Salary(Integer) If Salary > 100 then Salary else salary +100 TotSalary

So as given in the above example i need the derivation of each column in each stage of a given

Please let me know if this is possible

miwinter · Post by **miwinter** » Mon Jun 08, 2009 6:32 am

I'd suspect you'd be into the land of back-end querying... that's to say, reading out of the Datastage repository itself. Failing that, some amalgamation of exporting jobs over command line and then scripting something to strip out the necessary from each entry in the export dump.

chulett · Post by **chulett** » Mon Jun 08, 2009 6:34 am

Perhaps with an advanced uderstanding of the underlying repository you could engineer something directly from there. The API will get you most of that, but not the derivations. Otherwise, you may want to look into generating a job report and then processing that. There's an icon in the Designer that looks like an IE icon and that will get you one manually, there is also a way to generate the same from the command line from what I recall. The output is HTML but the intermediate XML can be preserved (from what I recall) and then perhaps reprocessed by you into your desired format.

You could also look into exporting the jobs to .dsx or .xml format and then processing that... perhaps via another job.

I would think other people will chime in with ideas as well.

ArndW · Post by **ArndW** » Mon Jun 08, 2009 6:43 am

The API call to DSGetLinkMetaData() will return a complete list of column names and metadata for each link in a stage. You would need to use DSOpenJob(), DSGetJobInfo(), DSGetStageInfo() and DSCloseJob() in order to get this to work.

chulett · Post by **chulett** » Mon Jun 08, 2009 6:56 am

This as a way to obtain the derivation? Interesting, don't recall seeing that one before. It's not new in the 8.x release, is it?

ArndW · Post by **ArndW** » Mon Jun 08, 2009 7:26 am

Craig - you are correct, I didn't read the complete post and thought that the original poster just wanted column names. As you correctly noted, the derivation is not available as part of the link information; and thet DSGetStageInfo() routine call does not have an option to return derivations.

So one would need to delve into the (undocumented) depths of the hashed files or to export the job and then parse the XML or DSX format.

kduke · Post by **kduke** » Mon Jun 08, 2009 10:04 am

DSX might be easier. Derivation or ParsedExpression or ParsedDerivation is what to look for. The column name will be a couple lines above that and have Name in front of it.

yabhinav · Post by **yabhinav** » Tue Jun 09, 2009 1:05 am

Thank you all for your suggestions. I've been trying export the job as an xml and parse it, but i guess this is gonna take me some time.

Will update as soon as i come up with something

Regards,
Abhinav

ray.wurlod · Post by **ray.wurlod** » Tue Jun 09, 2009 1:35 am

Why do you need to do this? The built-in job report functionality will provide all this and more.

Resist stupid requirements!

yabhinav · Post by **yabhinav** » Tue Jun 09, 2009 4:36 am

Here's why we need such a requirment. Our system gets its data from various source like mainframes, db2 and MQs. And we perform a lot of calulations on the incoming feed and then send it to our warehouse, which serves as a reporting platform to other applications. The plan is to have an intermediate system in place that does all these calculations and sends us a feed that we just load into the warehouse.
Now dont ask me why they are doing this!! cos i have no clue.

So we need to come up with all the fileds that are being calculated in the jobs and send them to another team.

That is the reason i requested a solution for such a STUPID requirment!!

ArndW · Post by **ArndW** » Tue Jun 09, 2009 4:50 am

What about using the lineage and impact analysis functionality built into the product?

yabhinav · Post by **yabhinav** » Tue Jun 09, 2009 5:07 am

Is this functionality available on 7.5 version??

ray.wurlod · Post by **ray.wurlod** » Tue Jun 09, 2009 4:47 pm

Only using MetaStage.