Summarizing Columns in Transformer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
maheshalways
Participant
Posts: 13
Joined: Tue Jan 25, 2005 12:07 am
Location: Mumbai,India

Summarizing Columns in Transformer

Post by maheshalways »

1) I need to summarize (sum total) 1 column and get record count of input data using a transformer (BASIC or PARALLEL).I want to avoid using Aggregator.Also since I need to call a server routine, I need to use BASIC transformer.
2)I cannot call server routine from parallel transformer.Can i abort a job using parallel transformer if some condition is not satisfied for input data.I have a server routine for the same.Anybody aware of parallel routine which can be called from parallel transformer which does the same job ?
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Post by pnchowdary »

Hi Mahesh,
1) I need to summarize (sum total) 1 column and get record count of input data using a transformer (BASIC or PARALLEL).I want to avoid using Aggregator.Also since I need to call a server routine, I need to use BASIC transformer.
To sum 1 column using a transformer, you can use stage variables and achieve this.

To get a record count of the input data, use the system variable @INROWNUM

2)I cannot call server routine from parallel transformer.Can i abort a job using parallel transformer if some condition is not satisfied for input data.I have a server routine for the same.Anybody aware of parallel routine which can be called from parallel transformer which does the same job ?
You can use a basic transformer in the PX job, that will allow you to call your server routine.

I hope this helps.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard! :D

Best practice is never to abort, so that you retain control. Pre-process the data to look for violations. If any is found, your job sequence can choose not to run the "real" job. Or to run an intermediate job to correct those violations, if such action is appopriate/possible.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
maheshalways
Participant
Posts: 13
Joined: Tue Jan 25, 2005 12:07 am
Location: Mumbai,India

Post by maheshalways »

pnchowdary wrote:Hi Mahesh,
1) I need to summarize (sum total) 1 column and get record count of input data using a transformer (BASIC or PARALLEL).I want to avoid using Aggregator.Also since I need to call a server routine, I need to use BASIC transformer.
To sum 1 column using a transformer, you can use stage variables and achieve this.

Mahesh: I used stage variable,but you see, it outputs as many rows in the input.I just need one summarized row.

To get a record count of the input data, use the system variable @INROWNUM

2)I cannot call server routine from parallel transformer.Can i abort a job using parallel transformer if some condition is not satisfied for input data.I have a server routine for the same.Anybody aware of parallel routine which can be called from parallel transformer which does the same job ?
You can use a basic transformer in the PX job, that will allow you to call your server routine.

Mahesh: I'm already doing that,My question is can I call sever routine in parallel transformer ?? Or how can i convert it into parallel routine ???

I hope this helps.
maheshalways
Participant
Posts: 13
Joined: Tue Jan 25, 2005 12:07 am
Location: Mumbai,India

Summarizing Using Transformer (BASIC / PARALLEL)

Post by maheshalways »

ray.wurlod wrote:Welcome aboard! :D

Best practice is never to abort, so that you retain control. Pre-process the data to look for violations. If any is found, your job sequence can choose not to run the "real" job. Or to run an intermediate job to correct those violations, if such action is appopriate/possible.
Mahesh.Parte Thanks for a Warm Welcome ! :D

I agree with you Ray, but you see Client is the KING.Though I will try following your approach, but it's important here to mention that we are not suppose to PREPROCESS the data , abort the job if the rejects exceeds the threshold value specified by the Client.Ray would request you to read the my questions abt using transformer for summarization,I'm sure your inputs would be valuable.Make a wonderful weekend ! 8)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The client may always be KING but even kings are fallible, or ignorant. If the client has demanded that you construct jobs in a way that is against your better (educated) judgment, demand to know why and point out the consequences of following the client's mandate versus your better design. The courage to do so marks out the truly professional consultant from the ordinary.

For example, why avoid the right tool for the job? The Aggregator stage is designed precisely to group and count (among other aggregation functions). You can have the Aggregator stage following your Transformer stage. (Incidentally, the parallel Aggregator stage is more robust than the server Aggregator stage, so any argument about it being flaky can be dismissed.)

Why must you call a server routine? Can't you implement the same logic using parallel job techniques, perhaps even an equivalent parallel routine? What does this routine do that it must be a server routine? The BASIC Transformer stage will prove to be a major throughput bottleneck, because it must run in Sequential mode. Create something that can take advantage of the parallel execution architecture.

I don't believe you or your client has thought this through. Challenge your client!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I agree with Ray, introduce a milestone point where data is staged, run the data to this milestone point and then decide if the load can be continued. We preprocess our data into load ready dataset files and then use simple database load jobs to get it in. Use job reporting and link counting to work out how successful the processing was and whether it passed threshhold levels.

You could hack the behaviour of parallel jobs to get what you want. The problem is that unlike server jobs you do not have much control over a transformer reject link. You cannot specifically define a reject message or define the rows to go down a reject link. You can hack it by creating a link out of a transformer that leads to a peek stage which will produce a log message for each row, use a custom job message handler to downgrade the peek message to a warning message. You now have a link that produces a warning for every row which will trigger the 50 warning message limit or whatever limit you set it to (via your sequence job or job control code). Also include a standard reject link and you get both custom rejects and standard rejects from that transformer that both produce warning messages.

I haven't used this, I prefer a message handling system that delivers meaningful messages to a log file or log table.
Post Reply