I have a datastage job in which I have couple of trasformers in series. I have seen for some reason in O/P I am getting duplicate rows. In order to debug the problem I have put the O/P of each trasformer in Sequential file. In the O/P of first trasformer their is no duplicate but in the O/P of next transformer I found that their is Duplicate. (I am sure based on the I/P data that trasformer should not produce duplicate)
Can somebody tell me why this is happening.
I found the way around by having a sequential file in between eery transformer but some how I am not comfortable with this solution.
DS Version :- 6.0.0.17
-Thanks
Multiple Transformer problem
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Does the second Transformer stage have a reference input that returns multiple rows?
Are there embedded end-of-line characters in, or being introduced into, your data?
If not, how are you generating duplicate rows?
Can you please describe the constraints and output derivations used in both Transformer stages?
Are there embedded end-of-line characters in, or being introduced into, your data?
If not, how are you generating duplicate rows?
Can you please describe the constraints and output derivations used in both Transformer stages?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks Craig and Ray,
Row buffering feature was "on" at the project level. I turned that off and now the job is working fine.![Smile :-)](./images/smilies/icon_smile.gif)
But I have lot of jobs in the project which has row buffering "on" at Job level. So I am not sure that I should take row buffering feature off from the job which are working fine.
Can you explain me what's the advantage of row buffering and why I encountered the duplication problem with
row buffering "on".
Again thanks for your efforts.
Row buffering feature was "on" at the project level. I turned that off and now the job is working fine.
![Smile :-)](./images/smilies/icon_smile.gif)
But I have lot of jobs in the project which has row buffering "on" at Job level. So I am not sure that I should take row buffering feature off from the job which are working fine.
Can you explain me what's the advantage of row buffering and why I encountered the duplication problem with
row buffering "on".
Again thanks for your efforts.
The advantages are pretty well spelled out in the documentation, primarily the Server Job Developers Guide. Performance. Between the use of Row Buffering or IPC Stages, performance gains can be substantial, especially on multi-processor systems.
As to the "why"... it just seems to be a bug.
I've seen it introduce duplicates like you've seen, false duplicates writing to Oracle and other interesting "problems" that go away when it is turned off. So, if you use it I'd suggest thoroughly testing it to ensure it is actually working as planned and doesn't just seem to be working fine. ![Wink :wink:](./images/smilies/icon_wink.gif)
As to the "why"... it just seems to be a bug.
![Confused :?](./images/smilies/icon_confused.gif)
![Wink :wink:](./images/smilies/icon_wink.gif)
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers