Determine end of input data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
hiteshm
Participant
Posts: 20
Joined: Tue May 10, 2005 5:32 am

Determine end of input data

Post by hiteshm »

Background:
We have several simple jobs which read a source table/sequential file and output the data to a target table via a transformer.

Requirement:
Write a row to a common audit table ehich includes the input and output row count columns:

Issue:
I was thinking a simply insterting a into the audit table, with a link from the transformer. But this link should only be taken once all the input rows have been processed.
Which constraint can I use to determine end of inout data?

I'm also open to other suggestions on meeting the above requirement.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

In job? Aggregator. Another approach is to gather this 'after job', using DSGetLinkInfo, either in the After Job portion of the current job or a generic trailing (seperate) job that just performs this function.
-craig

"You can never have too many knives" -- Logan Nine Fingers
hiteshm
Participant
Posts: 20
Joined: Tue May 10, 2005 5:32 am

Post by hiteshm »

Ok I've tried to use an Aggregator stage. The input columns to the stage are :

'1' = Key1
@INROWCOUNT = wk_Input_Row
@OUTROWCOUNT = wk_Output_Row

In the Aggregator stage I am grouping my Key1, and getting the maximum values for the other 2 columns.
But the output from the Aggregator stage seems to be 1 row, with the value from 1 partition. I'm working on a 4-node partition, therefore the value is roughly a quarter of the expected value. The partition methos is currently set to Auto on the Agg stage.

Which partition method should I use, to ensure I get the correct value?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Get ETLStats from Kim Duke's website - it will do all you require (and more) without you re-inventing the wheel.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mydsworld
Participant
Posts: 321
Joined: Thu Sep 07, 2006 3:55 am

Post by mydsworld »

How do you capture the counts in 'after-job' in Job properties.The procedures listed there do not contain 'DSGetLinkInfo'.

For a trailing generic job (for getting those counts), how can we do that in sequence.
hiteshm
Participant
Posts: 20
Joined: Tue May 10, 2005 5:32 am

Post by hiteshm »

Is ETLStats still applicable for DataStage v8? I mean are thier other, dare I say 'better', methods within v8?

I would still be interested in knowing the answer to my aggregate question. I can I ensure the aggregate is done over the complete data as opposed to a partition.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

ETLStats is still applicable.

The only way to aggregate over the entire data set is to run the Aggregator stage in a single node (that is, sequentially). You can use a Sort/Merge collector to preserve any existing sorting.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

mydsworld wrote:How do you capture the counts in 'after-job' in Job properties.The procedures listed there do not contain 'DSGetLinkInfo'.
If this is another lovely 'not in the v8 online help' thing, then use your pdf documentation files - the Advanced PX, Server Developer and BASIC pdf all mention those functions.
mydsworld also wrote:For a trailing generic job (for getting those counts), how can we do that in sequence.
I meant something that takes Job Name, Stage Name and Link Name as parameters and gets link information for any job. Typically this would be a routine and then the routine could pass the result to a generic 'load this metric' job that puts the result in your audit table.

Use a single node to... erk, never mind, Ray is up and online. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
mydsworld
Participant
Posts: 321
Joined: Thu Sep 07, 2006 3:55 am

Post by mydsworld »

Writing such routine to calculate the link count is easier in DS Batch.How ever with PX job,I am afraid I can not use such server routine.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes you can.

Even in parallel jobs, before-job and after-job subroutines are written in DataStage BASIC and executed by the conductor process.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mydsworld
Participant
Posts: 321
Joined: Thu Sep 07, 2006 3:55 am

Post by mydsworld »

Suppose I have a server routine 'MyServRoutine'(written in DS BASIC) which I want to include in PX job as 'Before-job' or 'After-job' subroutine.Now when I open the Job properties of PX job, I find only the following options in 'Before-job' or 'After-job' :

(none)
DSJobReport
DSSendMail
DSWaitForFile
ExecSH
ExecSHSilent
ExecTCL

So, there is no way to choose the routine 'MyServRoutine'.
Please advise.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It will only show up there if you've defined it as a 'Before/After' subroutine.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mydsworld
Participant
Posts: 321
Joined: Thu Sep 07, 2006 3:55 am

Post by mydsworld »

Thanks Craig.

When you use 'After-job' subroutine to gather job statistics (like record counts for different links etc),how do you specify the input arguments containing variuos stage and link names (No. of arguments can not be more than two),right?
mydsworld
Participant
Posts: 321
Joined: Thu Sep 07, 2006 3:55 am

Post by mydsworld »

Thanks Craig.

When you use 'After-job' subroutine to gather job statistics (like record counts for different links etc),how do you specify the input arguments containing variuos stage and link names (No. of arguments can not be more than two),right?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Cannot be more than one passed in, actually. So use a delimited list (pipe, perhaps) and then parse out the components inside the routine.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply