Page 1 of 1

Determine end of input data

Posted: Thu May 29, 2008 7:32 am
by hiteshm
Background:
We have several simple jobs which read a source table/sequential file and output the data to a target table via a transformer.

Requirement:
Write a row to a common audit table ehich includes the input and output row count columns:

Issue:
I was thinking a simply insterting a into the audit table, with a link from the transformer. But this link should only be taken once all the input rows have been processed.
Which constraint can I use to determine end of inout data?

I'm also open to other suggestions on meeting the above requirement.

Posted: Thu May 29, 2008 8:10 am
by chulett
In job? Aggregator. Another approach is to gather this 'after job', using DSGetLinkInfo, either in the After Job portion of the current job or a generic trailing (seperate) job that just performs this function.

Posted: Thu May 29, 2008 10:06 am
by hiteshm
Ok I've tried to use an Aggregator stage. The input columns to the stage are :

'1' = Key1
@INROWCOUNT = wk_Input_Row
@OUTROWCOUNT = wk_Output_Row

In the Aggregator stage I am grouping my Key1, and getting the maximum values for the other 2 columns.
But the output from the Aggregator stage seems to be 1 row, with the value from 1 partition. I'm working on a 4-node partition, therefore the value is roughly a quarter of the expected value. The partition methos is currently set to Auto on the Agg stage.

Which partition method should I use, to ensure I get the correct value?

Posted: Thu May 29, 2008 3:37 pm
by ray.wurlod
Get ETLStats from Kim Duke's website - it will do all you require (and more) without you re-inventing the wheel.

Posted: Thu May 29, 2008 3:51 pm
by mydsworld
How do you capture the counts in 'after-job' in Job properties.The procedures listed there do not contain 'DSGetLinkInfo'.

For a trailing generic job (for getting those counts), how can we do that in sequence.

Posted: Fri May 30, 2008 2:42 am
by hiteshm
Is ETLStats still applicable for DataStage v8? I mean are thier other, dare I say 'better', methods within v8?

I would still be interested in knowing the answer to my aggregate question. I can I ensure the aggregate is done over the complete data as opposed to a partition.

Posted: Fri May 30, 2008 6:35 am
by ray.wurlod
ETLStats is still applicable.

The only way to aggregate over the entire data set is to run the Aggregator stage in a single node (that is, sequentially). You can use a Sort/Merge collector to preserve any existing sorting.

Posted: Fri May 30, 2008 6:37 am
by chulett
mydsworld wrote:How do you capture the counts in 'after-job' in Job properties.The procedures listed there do not contain 'DSGetLinkInfo'.
If this is another lovely 'not in the v8 online help' thing, then use your pdf documentation files - the Advanced PX, Server Developer and BASIC pdf all mention those functions.
mydsworld also wrote:For a trailing generic job (for getting those counts), how can we do that in sequence.
I meant something that takes Job Name, Stage Name and Link Name as parameters and gets link information for any job. Typically this would be a routine and then the routine could pass the result to a generic 'load this metric' job that puts the result in your audit table.

Use a single node to... erk, never mind, Ray is up and online. :wink:

Posted: Fri May 30, 2008 9:54 am
by mydsworld
Writing such routine to calculate the link count is easier in DS Batch.How ever with PX job,I am afraid I can not use such server routine.

Posted: Fri May 30, 2008 3:57 pm
by ray.wurlod
Yes you can.

Even in parallel jobs, before-job and after-job subroutines are written in DataStage BASIC and executed by the conductor process.

Posted: Sun Jun 01, 2008 10:07 pm
by mydsworld
Suppose I have a server routine 'MyServRoutine'(written in DS BASIC) which I want to include in PX job as 'Before-job' or 'After-job' subroutine.Now when I open the Job properties of PX job, I find only the following options in 'Before-job' or 'After-job' :

(none)
DSJobReport
DSSendMail
DSWaitForFile
ExecSH
ExecSHSilent
ExecTCL

So, there is no way to choose the routine 'MyServRoutine'.
Please advise.

Posted: Sun Jun 01, 2008 10:14 pm
by chulett
It will only show up there if you've defined it as a 'Before/After' subroutine.

Posted: Sun Jun 01, 2008 10:27 pm
by mydsworld
Thanks Craig.

When you use 'After-job' subroutine to gather job statistics (like record counts for different links etc),how do you specify the input arguments containing variuos stage and link names (No. of arguments can not be more than two),right?

Posted: Sun Jun 01, 2008 10:30 pm
by mydsworld
Thanks Craig.

When you use 'After-job' subroutine to gather job statistics (like record counts for different links etc),how do you specify the input arguments containing variuos stage and link names (No. of arguments can not be more than two),right?

Posted: Sun Jun 01, 2008 10:33 pm
by chulett
Cannot be more than one passed in, actually. So use a delimited list (pipe, perhaps) and then parse out the components inside the routine.