Page 1 of 1

Multiple Transformer problem

Posted: Mon May 10, 2004 1:58 pm
by iamrajy
I have a datastage job in which I have couple of trasformers in series. I have seen for some reason in O/P I am getting duplicate rows. In order to debug the problem I have put the O/P of each trasformer in Sequential file. In the O/P of first trasformer their is no duplicate but in the O/P of next transformer I found that their is Duplicate. (I am sure based on the I/P data that trasformer should not produce duplicate)

Can somebody tell me why this is happening.

I found the way around by having a sequential file in between eery transformer but some how I am not comfortable with this solution.

DS Version :- 6.0.0.17

-Thanks

Posted: Mon May 10, 2004 2:31 pm
by chulett
Do you have Row Buffering enabled? If so, try your job with it turned off and see if that solves it.

Posted: Mon May 10, 2004 3:13 pm
by iamrajy
I tried without row buffering and still I am facing same issue. I tried at Job level. Is there a row buffering feature at Project level?

Posted: Mon May 10, 2004 3:25 pm
by chulett
Yes, there is. It is set via the Administrator: Project -> Properties -> Tunables.

Posted: Tue May 11, 2004 2:12 am
by ray.wurlod
Does the second Transformer stage have a reference input that returns multiple rows?

Are there embedded end-of-line characters in, or being introduced into, your data?

If not, how are you generating duplicate rows?

Can you please describe the constraints and output derivations used in both Transformer stages?

Posted: Tue May 11, 2004 7:48 am
by iamrajy
Thanks Craig and Ray,

Row buffering feature was "on" at the project level. I turned that off and now the job is working fine. :-)

But I have lot of jobs in the project which has row buffering "on" at Job level. So I am not sure that I should take row buffering feature off from the job which are working fine.

Can you explain me what's the advantage of row buffering and why I encountered the duplication problem with
row buffering "on".

Again thanks for your efforts.

Posted: Tue May 11, 2004 8:27 am
by chulett
The advantages are pretty well spelled out in the documentation, primarily the Server Job Developers Guide. Performance. Between the use of Row Buffering or IPC Stages, performance gains can be substantial, especially on multi-processor systems.

As to the "why"... it just seems to be a bug. :? I've seen it introduce duplicates like you've seen, false duplicates writing to Oracle and other interesting "problems" that go away when it is turned off. So, if you use it I'd suggest thoroughly testing it to ensure it is actually working as planned and doesn't just seem to be working fine. :wink: