Multiple Transformer problem

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
iamrajy
Participant
Posts: 20
Joined: Mon Apr 26, 2004 10:38 am

Multiple Transformer problem

Post by iamrajy »

I have a datastage job in which I have couple of trasformers in series. I have seen for some reason in O/P I am getting duplicate rows. In order to debug the problem I have put the O/P of each trasformer in Sequential file. In the O/P of first trasformer their is no duplicate but in the O/P of next transformer I found that their is Duplicate. (I am sure based on the I/P data that trasformer should not produce duplicate)

Can somebody tell me why this is happening.

I found the way around by having a sequential file in between eery transformer but some how I am not comfortable with this solution.

DS Version :- 6.0.0.17

-Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Do you have Row Buffering enabled? If so, try your job with it turned off and see if that solves it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
iamrajy
Participant
Posts: 20
Joined: Mon Apr 26, 2004 10:38 am

Post by iamrajy »

I tried without row buffering and still I am facing same issue. I tried at Job level. Is there a row buffering feature at Project level?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes, there is. It is set via the Administrator: Project -> Properties -> Tunables.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Does the second Transformer stage have a reference input that returns multiple rows?

Are there embedded end-of-line characters in, or being introduced into, your data?

If not, how are you generating duplicate rows?

Can you please describe the constraints and output derivations used in both Transformer stages?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
iamrajy
Participant
Posts: 20
Joined: Mon Apr 26, 2004 10:38 am

Post by iamrajy »

Thanks Craig and Ray,

Row buffering feature was "on" at the project level. I turned that off and now the job is working fine. :-)

But I have lot of jobs in the project which has row buffering "on" at Job level. So I am not sure that I should take row buffering feature off from the job which are working fine.

Can you explain me what's the advantage of row buffering and why I encountered the duplication problem with
row buffering "on".

Again thanks for your efforts.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The advantages are pretty well spelled out in the documentation, primarily the Server Job Developers Guide. Performance. Between the use of Row Buffering or IPC Stages, performance gains can be substantial, especially on multi-processor systems.

As to the "why"... it just seems to be a bug. :? I've seen it introduce duplicates like you've seen, false duplicates writing to Oracle and other interesting "problems" that go away when it is turned off. So, if you use it I'd suggest thoroughly testing it to ensure it is actually working as planned and doesn't just seem to be working fine. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply