Determine end of input data

hiteshm · Post by **hiteshm** » Thu May 29, 2008 7:32 am

Background:
We have several simple jobs which read a source table/sequential file and output the data to a target table via a transformer.

Requirement:
Write a row to a common audit table ehich includes the input and output row count columns:

Issue:
I was thinking a simply insterting a into the audit table, with a link from the transformer. But this link should only be taken once all the input rows have been processed.
Which constraint can I use to determine end of inout data?

I'm also open to other suggestions on meeting the above requirement.

chulett · Post by **chulett** » Thu May 29, 2008 8:10 am

In job? Aggregator. Another approach is to gather this 'after job', using DSGetLinkInfo, either in the After Job portion of the current job or a generic trailing (seperate) job that just performs this function.

hiteshm · Post by **hiteshm** » Thu May 29, 2008 10:06 am

Ok I've tried to use an Aggregator stage. The input columns to the stage are :

'1' = Key1
@INROWCOUNT = wk_Input_Row
@OUTROWCOUNT = wk_Output_Row

In the Aggregator stage I am grouping my Key1, and getting the maximum values for the other 2 columns.
But the output from the Aggregator stage seems to be 1 row, with the value from 1 partition. I'm working on a 4-node partition, therefore the value is roughly a quarter of the expected value. The partition methos is currently set to Auto on the Agg stage.

Which partition method should I use, to ensure I get the correct value?

ray.wurlod · Post by **ray.wurlod** » Thu May 29, 2008 3:37 pm

Get ETLStats from Kim Duke's website - it will do all you require (and more) without you re-inventing the wheel.

mydsworld · Post by **mydsworld** » Thu May 29, 2008 3:51 pm

How do you capture the counts in 'after-job' in Job properties.The procedures listed there do not contain 'DSGetLinkInfo'.

For a trailing generic job (for getting those counts), how can we do that in sequence.

hiteshm · Post by **hiteshm** » Fri May 30, 2008 2:42 am

Is ETLStats still applicable for DataStage v8? I mean are thier other, dare I say 'better', methods within v8?

I would still be interested in knowing the answer to my aggregate question. I can I ensure the aggregate is done over the complete data as opposed to a partition.

ray.wurlod · Post by **ray.wurlod** » Fri May 30, 2008 6:35 am

ETLStats is still applicable.

The only way to aggregate over the entire data set is to run the Aggregator stage in a single node (that is, sequentially). You can use a Sort/Merge collector to preserve any existing sorting.

chulett · Post by **chulett** » Fri May 30, 2008 6:37 am

mydsworld wrote:How do you capture the counts in 'after-job' in Job properties.The procedures listed there do not contain 'DSGetLinkInfo'.

If this is another lovely 'not in the v8 online help' thing, then use your pdf documentation files - the Advanced PX, Server Developer and BASIC pdf all mention those functions.

mydsworld also wrote:For a trailing generic job (for getting those counts), how can we do that in sequence.

I meant something that takes Job Name, Stage Name and Link Name as parameters and gets link information for any job. Typically this would be a routine and then the routine could pass the result to a generic 'load this metric' job that puts the result in your audit table.

Use a single node to... erk, never mind, Ray is up and online.

mydsworld · Post by **mydsworld** » Fri May 30, 2008 9:54 am

Writing such routine to calculate the link count is easier in DS Batch.How ever with PX job,I am afraid I can not use such server routine.

ray.wurlod · Post by **ray.wurlod** » Fri May 30, 2008 3:57 pm

Yes you can.

Even in parallel jobs, before-job and after-job subroutines are written in DataStage BASIC and executed by the conductor process.

mydsworld · Post by **mydsworld** » Sun Jun 01, 2008 10:07 pm

Suppose I have a server routine 'MyServRoutine'(written in DS BASIC) which I want to include in PX job as 'Before-job' or 'After-job' subroutine.Now when I open the Job properties of PX job, I find only the following options in 'Before-job' or 'After-job' :

(none)
DSJobReport
DSSendMail
DSWaitForFile
ExecSH
ExecSHSilent
ExecTCL

So, there is no way to choose the routine 'MyServRoutine'.
Please advise.

chulett · Post by **chulett** » Sun Jun 01, 2008 10:14 pm

It will only show up there if you've defined it as a 'Before/After' subroutine.

mydsworld · Post by **mydsworld** » Sun Jun 01, 2008 10:27 pm

Thanks Craig.

When you use 'After-job' subroutine to gather job statistics (like record counts for different links etc),how do you specify the input arguments containing variuos stage and link names (No. of arguments can not be more than two),right?

mydsworld · Post by **mydsworld** » Sun Jun 01, 2008 10:30 pm

Thanks Craig.

When you use 'After-job' subroutine to gather job statistics (like record counts for different links etc),how do you specify the input arguments containing variuos stage and link names (No. of arguments can not be more than two),right?

chulett · Post by **chulett** » Sun Jun 01, 2008 10:33 pm

Cannot be more than one passed in, actually. So use a delimited list (pipe, perhaps) and then parse out the components inside the routine.