Lookup stage outputs duplicate rows
Posted: Sun Mar 16, 2014 6:24 pm
Hello,
I am transitioning from the server to the parallel world so bear with me...
I have two oracle tables: one is used as stream and the other as reference. I need to find out which rows from the stream are NOT in the reference and output them to a target file.
I am using the Oracle connector for both stream and reference (row count on each is under 400) and I am using a Lookup stage with a reject link to capture the failed lookups. I coded for a sparse lookup in the reference Oracle connector.
Since the Lookup stage does not allow me to code only the reject link I have also an output stream link to a flat file for the lookup hits.
What I am seeing is that the job outputs 200k+ rows to the output stream link. In looking at this data I find that there are lots of duplicates (500+) for each input stream row.
Can you please explain this behavior of the Lookup stage?
Thanks.
Marco
I am transitioning from the server to the parallel world so bear with me...
I have two oracle tables: one is used as stream and the other as reference. I need to find out which rows from the stream are NOT in the reference and output them to a target file.
I am using the Oracle connector for both stream and reference (row count on each is under 400) and I am using a Lookup stage with a reject link to capture the failed lookups. I coded for a sparse lookup in the reference Oracle connector.
Since the Lookup stage does not allow me to code only the reject link I have also an output stream link to a flat file for the lookup hits.
What I am seeing is that the job outputs 200k+ rows to the output stream link. In looking at this data I find that there are lots of duplicates (500+) for each input stream row.
Can you please explain this behavior of the Lookup stage?
Thanks.
Marco