Page 1 of 1

Join stage only working running in Sequential mode

Posted: Tue Mar 17, 2009 10:18 am
by manuel.gomez
Hello all,

I am facing a problem with Join stage.

I do have two input links, being joined in left outer join mode. Subsequent to Join stage, there is a transformer implementing the logic to detect unmatched records, and place them in a file. In the records matched, they move on in the job flow.

To detect this unmatching, I use a dummy column.
The right link in the Join is comming with a constant column, got like this:

SELECT ......, '1' AS DUMMY
FROM....

So, in transformer, I only need to do this

NullToEmpty(DUMMY) <> '1'

But when executed the job, all records were considered unmatched.

I am running the job with a two node config file, and the two input links were defined, for Join stage, as hashed partitioning method on the column used for join, and performing sort.

I defined the join stage to run in Sequential mode, and the problem was fixed.

Can anybody please give me an idea on what could be happening?

Thanks a lot

Posted: Tue Mar 17, 2009 3:43 pm
by ray.wurlod
Overkill. Why not just allow relevant fields to be nullable and test for null? (Tip: test a key column from the outer input for null - it can never genuinely be null in its table because it's a key.)

Re: Join stage only working running in Sequential mode

Posted: Tue Mar 17, 2009 11:37 pm
by sureshreddy2009
manuel.gomez wrote:Hello all,

I am facing a problem with Join stage.

I do have two input links, being joined in left outer join mode. Subsequent to Join stage, there is a transformer implementing the logic to detect unmatched records, and place them in a file. In the records matched, they move on in the job flow.

To detect this unmatching, I use a dummy column.
The right link in the Join is comming with a constant column, got like this:

SELECT ......, '1' AS DUMMY
FROM....

So, in transformer, I only need to do this

NullToEmpty(DUMMY) <> '1'

But when executed the job, all records were considered unmatched.

I am running the job with a two node config file, and the two input links were defined, for Join stage, as hashed partitioning method on the column used for join, and performing sort.

I defined the join stage to run in Sequential mode, and the problem was fixed.

Can anybody please give me an idea on what could be happening?

Thanks a lot
:D Hi,

Based on your sentences i can say one thing based on my understanding
step1:
from query you are fetching constant column dummy which always contain the value as '1'
step2:
in the transformer you write the constraint as
NullToEmpty(DUMMY) <> '1', first this function on lefthand side always gives 1,because there is no null value in dummy column, if it is null it makes empty but here no null value only the values is 1 and when we compare 1<>1, it fails and all records which checked this condition becomes mismatched.is this useful or i understand your question in another way

Re: Join stage only working running in Sequential mode

Posted: Wed Mar 18, 2009 12:10 am
by Kryt0n
sureshreddy2009 wrote: Based on your sentences i can say one thing based on my understanding
step1:
from query you are fetching constant column dummy which always contain the value as '1'
What you are missing is that the dummy column is coming from the right hand side of a left outer join. So when no match is found, this column will be null.

As for the actual problem, can't you use the lookup stage instead and define a reject link for when no match is found?

Re: Join stage only working running in Sequential mode

Posted: Wed Mar 18, 2009 2:40 am
by manuel.gomez
Kryt0n wrote:
sureshreddy2009 wrote: Based on your sentences i can say one thing based on my understanding
step1:
from query you are fetching constant column dummy which always contain the value as '1'
What you are missing is that the dummy column is coming from the right hand side of a left outer join. So when no match is found, this column will be null.

As for the actual problem, can't you use the lookup stage instead and define a reject link for when no match is found?
you are right Kryt0n, the dummy column is the right side

No, we cant use lookup, we are joining big volumes of data, and the use of memory would be huge

Re: Join stage only working running in Sequential mode

Posted: Wed Mar 18, 2009 4:41 am
by sureshreddy2009
manuel.gomez wrote:
Kryt0n wrote:
sureshreddy2009 wrote: Based on your sentences i can say one thing based on my understanding
step1:
from query you are fetching constant column dummy which always contain the value as '1'
What you are missing is that the dummy column is coming from the right hand side of a left outer join. So when no match is found, this column will be null.

As for the actual problem, can't you use the lookup stage instead and define a reject link for when no match is found?
you are right Kryt0n, the dummy column is the right side

No, we cant use lookup, we are joining big volumes of data, and the use of memory would be huge
Hi
I checked your query at my side, my conclusion is
i dont bother about the data input to join,but after join the field dummy contains value '1' for some records and null for some records,if this data will go to transformer then the function NullToEmpty(DUMMY) <> '1' results like this way,
step1:NullToEmpty(DUMMY) gives 1 for some records and empty(no value but not equal to null) for some records

step2:NullToEmpty(DUMMY) <> '1' constraint in transformer is giving output records which are containing null only and discarding the records containing value as '1'
let check at your side

Re: Join stage only working running in Sequential mode

Posted: Wed Mar 18, 2009 4:59 am
by manuel.gomez
sureshreddy2009 wrote:
manuel.gomez wrote:
Kryt0n wrote: What you are missing is that the dummy column is coming from the right hand side of a left outer join. So when no match is found, this column will be null.

As for the actual problem, can't you use the lookup stage instead and define a reject link for when no match is found?
you are right Kryt0n, the dummy column is the right side

No, we cant use lookup, we are joining big volumes of data, and the use of memory would be huge
Hi
I checked your query at my side, my conclusion is
i dont bother about the data input to join,but after join the field dummy contains value '1' for some records and null for some records,if this data will go to transformer then the function NullToEmpty(DUMMY) <> '1' results like this way,
step1:NullToEmpty(DUMMY) gives 1 for some records and empty(no value but not equal to null) for some records

step2:NullToEmpty(DUMMY) <> '1' constraint in transformer is giving output records which are containing null only and discarding the records containing value as '1'
let check at your side
Thanks for answering sureshreddy2009

I am not sure if I properly understood you

What I said in first post was

So, in transformer, I only need to do this

NullToEmpty(DUMMY) <> '1'
The problem is this constraing was not working. All records were returning null value for DUMMY, when actually a few records should match data

Setting sequential mode for Join, the issue was fixed, but this is not desirable solution to this.......