Page 1 of 2

Weired Join Stage problem.

Posted: Mon Aug 20, 2007 11:27 pm
by xinhuang66
I found a very weired things in Join stage, I couldn't understand why DS works like that .

Sequencial file stage 1
Varchar(3)
001
002
003

Sequencial file stage 2
varchar(5), varchar(5)
0010, apple
001, pear

Inner join 1 and 2 will get
001, apple
001, pear

However, if I increase sequencial file1 from varchar(3) to varchar(5), DS will give me right answer..

Anybody can help me explain why this can happen ? or I need to set some environment variables to get rid of such things.

thanks

Posted: Mon Aug 20, 2007 11:31 pm
by ArndW
My guess is that it takes the minimum definition of the 2 varchar inputs and uses that for the join. If you change your 2nd sequential file first column to varchar(3) your join is correct.

Posted: Mon Aug 20, 2007 11:32 pm
by bcarlson
As I understand it, datatype matters in a join, as does data format. For example, '003 ' is different from '003'.

I know this is the case in a filter. If the field is 5 characters and has a 'A', then the filter clause needs to be 'A '. I would assume the same rule applies in a join stage.

... course that may be specific to 'char' and not varchar. We don't use many varchars, especially for fields that are keys or often used in where clauses, so to be honest we have rarely run into this.

Maybe the bigger issue is matching on datatype...

Brad.

Posted: Mon Aug 20, 2007 11:36 pm
by Maveric
Join stage will compare the field names as well as the meta data. Since the lengths are different (3 and 5) the field with length 5 is truncated to 3. hence the output. Out of curiosity what happens with RCP off in this case with different lengths? Can u try that out. should not make a difference, but can u post the result here.

Posted: Mon Aug 20, 2007 11:39 pm
by Maveric
ArndW wrote: If you change your 2nd sequential file first column to varchar(3) your join is correct. ...
But the result would be the same and possibly a warning telling that the data is truncated in the source stage. wouldn't it?

Posted: Mon Aug 20, 2007 11:45 pm
by ArndW
But if "0010" gets truncated to 3 character it would be "001", hence the initial result.

Posted: Mon Aug 20, 2007 11:49 pm
by xinhuang66
We didn't Enable RCP.

How come DS will automatically truncate the column length to the smaller one ? it will generate heaps of wrong results ...

I just can't believe it....

Posted: Tue Aug 21, 2007 12:00 am
by Maveric
The simple and sure shot work around is increase the length from 3 to 5 in a copy or transformer stage and it will work. I have had this problem and this is working fine.

Posted: Tue Aug 21, 2007 12:06 am
by JoshGeorge
Is there a chance that there is something wrong with your job design? I just ran the above example and it gives me the right answer.

Posted: Tue Aug 21, 2007 1:10 am
by Rubu
Right!

I agree to Josh. Even I could not really recreate the problem. Whereas I got the following warning which may put some light in understanding why its happening.

When checking operator: When binding output schema variable "leftRec": When binding output interface field "id1" to field "Id1": Implicit conversion from source type "string[max=5]" to result type "string[max=3]": Possible truncation of variable length string.

I feel that the VARCHAR(5) column is getting truncated to VarChar(3). And hence the wierd result.

having said that I accept the fact that I have failed to generate the wrong output :)

Posted: Tue Aug 21, 2007 1:26 am
by JoshGeorge
If you change the link odering then below error also will disappear.
This is because your "leftRec" of Inner join is with VarChar(3) and you have mapped that to the output result column, which is VarChar(5).
Rubu wrote: Whereas I got the following warning which may put some light in understanding why its happening.

When checking operator: When binding output schema variable "leftRec": When binding output interface field "id1" to field "Id1": Implicit conversion from source type "string[max=5]" to result type "string[max=3]": Possible truncation of variable length string.

I feel that the VARCHAR(5) column is getting truncated to VarChar(3). And hence the wierd result.

Posted: Wed Aug 22, 2007 9:33 pm
by xinhuang66
JoshGeorge wrote:Is there a chance that there is something wrong with your job design? I just ran the above example and it gives me the right answer.
Hi Josh,

I guess that is possible because of the input link sequences, Can you just swap your input link to see what will happen?

Posted: Thu Aug 23, 2007 12:10 am
by JoshGeorge
You are the OP , pls. test and see for yourself. We have already posted our findings to help you on this regard.
xinhuang66 wrote:
JoshGeorge wrote:Is there a chance that there is something wrong with your job design? I just ran the above example and it gives me the right answer.
Hi Josh,

I guess that is possible because of the input link sequences, Can you just swap your input link to see what will happen?

Posted: Thu Aug 23, 2007 3:20 am
by harshada
We had a similar kind of problem , but our join keys were Char(8) and Varchar(8) . THe result as has laready been mentioned is weird.
So we made both the join keys data type same and have followed the same principle in every job that has a JOIN Stage , to keep join keys data type same ( even lenght)

Posted: Thu Aug 23, 2007 3:21 am
by harshada
sorry about the previous post

'' We had a similar kind of problem , but our join keys were Char8 and Varchar8 . THe result as has already been mentioned is weird. "