Summarizing Columns in Transformer

maheshalways · Post by **maheshalways** » Thu Jun 30, 2005 11:49 pm

1) I need to summarize (sum total) 1 column and get record count of input data using a transformer (BASIC or PARALLEL).I want to avoid using Aggregator.Also since I need to call a server routine, I need to use BASIC transformer.
2)I cannot call server routine from parallel transformer.Can i abort a job using parallel transformer if some condition is not satisfied for input data.I have a server routine for the same.Anybody aware of parallel routine which can be called from parallel transformer which does the same job ?

pnchowdary · Post by **pnchowdary** » Fri Jul 01, 2005 8:27 am

Hi Mahesh,

1) I need to summarize (sum total) 1 column and get record count of input data using a transformer (BASIC or PARALLEL).I want to avoid using Aggregator.Also since I need to call a server routine, I need to use BASIC transformer.

To sum 1 column using a transformer, you can use stage variables and achieve this.

To get a record count of the input data, use the system variable @INROWNUM

2)I cannot call server routine from parallel transformer.Can i abort a job using parallel transformer if some condition is not satisfied for input data.I have a server routine for the same.Anybody aware of parallel routine which can be called from parallel transformer which does the same job ?

You can use a basic transformer in the PX job, that will allow you to call your server routine.

I hope this helps.

ray.wurlod · Post by **ray.wurlod** » Fri Jul 01, 2005 6:45 pm

Welcome aboard! :D

Best practice is never to abort, so that you retain control. Pre-process the data to look for violations. If any is found, your job sequence can choose not to run the "real" job. Or to run an intermediate job to correct those violations, if such action is appopriate/possible.

maheshalways · Post by **maheshalways** » Sat Jul 02, 2005 2:14 am

pnchowdary wrote:Hi Mahesh,

1) I need to summarize (sum total) 1 column and get record count of input data using a transformer (BASIC or PARALLEL).I want to avoid using Aggregator.Also since I need to call a server routine, I need to use BASIC transformer.
To sum 1 column using a transformer, you can use stage variables and achieve this.

Mahesh: I used stage variable,but you see, it outputs as many rows in the input.I just need one summarized row.

To get a record count of the input data, use the system variable @INROWNUM

2)I cannot call server routine from parallel transformer.Can i abort a job using parallel transformer if some condition is not satisfied for input data.I have a server routine for the same.Anybody aware of parallel routine which can be called from parallel transformer which does the same job ?
You can use a basic transformer in the PX job, that will allow you to call your server routine.

Mahesh: I'm already doing that,My question is can I call sever routine in parallel transformer ?? Or how can i convert it into parallel routine ???

I hope this helps.

maheshalways · Post by **maheshalways** » Sat Jul 02, 2005 2:25 am

ray.wurlod wrote:Welcome aboard! :D

Best practice is never to abort, so that you retain control. Pre-process the data to look for violations. If any is found, your job sequence can choose not to run the "real" job. Or to run an intermediate job to correct those violations, if such action is appopriate/possible.

Mahesh.Parte Thanks for a Warm Welcome ! :D

I agree with you Ray, but you see Client is the KING.Though I will try following your approach, but it's important here to mention that we are not suppose to PREPROCESS the data , abort the job if the rejects exceeds the threshold value specified by the Client.Ray would request you to read the my questions abt using transformer for summarization,I'm sure your inputs would be valuable.Make a wonderful weekend !

ray.wurlod · Post by **ray.wurlod** » Sat Jul 02, 2005 7:12 pm

The client may always be KING but even kings are fallible, or ignorant. If the client has demanded that you construct jobs in a way that is against your better (educated) judgment, demand to know why and point out the consequences of following the client's mandate versus your better design. The courage to do so marks out the truly professional consultant from the ordinary.

For example, why avoid the right tool for the job? The Aggregator stage is designed precisely to group and count (among other aggregation functions). You can have the Aggregator stage following your Transformer stage. (Incidentally, the parallel Aggregator stage is more robust than the server Aggregator stage, so any argument about it being flaky can be dismissed.)

Why must you call a server routine? Can't you implement the same logic using parallel job techniques, perhaps even an equivalent parallel routine? What does this routine do that it must be a server routine? The BASIC Transformer stage will prove to be a major throughput bottleneck, because it must run in Sequential mode. Create something that can take advantage of the parallel execution architecture.

I don't believe you or your client has thought this through. Challenge your client!

vmcburney · Post by **vmcburney** » Sun Jul 03, 2005 7:09 pm

I agree with Ray, introduce a milestone point where data is staged, run the data to this milestone point and then decide if the load can be continued. We preprocess our data into load ready dataset files and then use simple database load jobs to get it in. Use job reporting and link counting to work out how successful the processing was and whether it passed threshhold levels.

You could hack the behaviour of parallel jobs to get what you want. The problem is that unlike server jobs you do not have much control over a transformer reject link. You cannot specifically define a reject message or define the rows to go down a reject link. You can hack it by creating a link out of a transformer that leads to a peek stage which will produce a log message for each row, use a custom job message handler to downgrade the peek message to a warning message. You now have a link that produces a warning for every row which will trigger the 50 warning message limit or whatever limit you set it to (via your sequence job or job control code). Also include a standard reject link and you get both custom rejects and standard rejects from that transformer that both produce warning messages.

I haven't used this, I prefer a message handling system that delivers meaningful messages to a log file or log table.

DSXchange

Summarizing Columns in Transformer

Summarizing Columns in Transformer

Summarizing Using Transformer (BASIC / PARALLEL)