Page 1 of 1

Sequential File to Seq File join behaviour...

Posted: Thu Oct 04, 2012 10:31 am
by kaps
I noticed something and want to make sure that my understanding is correct.

Job design is :

Code: Select all

                SeqFile2
	         	|
	         	|
SeqFile1---Join Stage----Transformer---Remove Dup----File
I join a Seq file with another Seq file using join stage and in transformer I check IsNull(SeqFile2.Col1) to find out if
the lookup is found or not and it's not working. Do we need to do this in a different way when we join a seq file with
another seq file ? All columns in my lookup seq file is Not Nullable.

Please advise.

Posted: Thu Oct 04, 2012 10:36 am
by PhilHibbs
Numeric values can't be NULL. If the join fails then the numeric values from SeqFile2 will be 0, not NULL. I think string values can be NULL.

Posted: Thu Oct 04, 2012 10:49 am
by kaps
All my lookup columns are Varchar and funny thing about this is that the field is defined as Not Nullable in Database and the job did not fail but there are no values in the column when you query the table.

Correction in my job design that final stage is DB2 connector not a file.

Posted: Thu Oct 04, 2012 10:56 am
by PhilHibbs
In a database, an empty string is not a null. So if your job is writing empty strings when the join fails, and to be honest I can't remember if that is the case for string types, then you won't get "is not nullable" errors.

If a varchar from a failed join is blank rather than NULL, then you need to pick... or or introduce... a column that cannot be blank or zero in order to check whether the join worked or not. I was lucky on the last project in that all internal data records had a surrogate key that could never be zero so we just used that, and I always introduced a surrogate rownumber key on supplier input data that had not been loaded into the data yet. I used the Sequental File automatic row number column, but had to add 1 to it because it numbered rows starting at 0 (bad idea, IBM!)

Posted: Thu Oct 04, 2012 11:35 am
by kaps
Thanks for the reply. I understand the difference between empty string and null but my question is that when we don't find a matching record in the right file what do we get ? As I mentioned in my original post all columns are not nullable so I don't think introducing another column with some default value is going to make a difference. When I query the database I don't find any empty space. Could be some unprintable character but how come when I don't find a look up it returns un printable characters ? Am I misiing something here ?

Posted: Thu Oct 04, 2012 4:16 pm
by FranklinE
There's nothing better than using some debugging techniques. One I've used is to have failed joins go to a reject link and file, so I can examine the data for reasons.

You didn't mention what type of join you are using, and you use the term "lookup" in your posts. Is it possible that you might have success with using a Lookup stage instead of Join?

Posted: Thu Oct 04, 2012 5:23 pm
by SURA
Hi

1. What type of join you used?

2. Let us know the linking order?

Re: Sequential File to Seq File join behaviour...

Posted: Thu Oct 04, 2012 10:45 pm
by mouthou
try the old fashiobned method of sorting the source and reference data before Join, and see the correctness of the lookup.

Posted: Fri Oct 05, 2012 3:55 am
by ArndW
If you declare your numeric columns coming from the sequential files as being nullable (even though they aren't), then after your left outer join you will get null values for the joined columns when the join doesn't find a match.

Posted: Fri Oct 05, 2012 3:58 am
by PhilHibbs
I don't think that's correct. I've always got 0 for a failed join even when the column is defined as nullable. What you describe may be correct for Server jobs, I'm a bit rusty on them, but not for Parallel.

Posted: Fri Oct 05, 2012 4:21 am
by ArndW
Phil - it is correct; I just wrote a test job to make sure that it is so.

The SQL logic of a left outer states that for each left input row there will be an output row, those fields coming from the right side of the join when unmatched are set to <null>. If, in DataStage, this field is not nullable then the default value is used, i.e. 0 for numbers, "" for strings. If the fields are nullable then null values are used. The same logic and methodology applies to lookup stages as well.

Posted: Fri Oct 05, 2012 4:39 am
by PhilHibbs
Then there is something very strange going on here because I am sure that that is not how it behaved on my last project. Could some configuration option affect this behaviour?

Posted: Fri Oct 05, 2012 5:11 am
by ArndW
No, not as far as I know. I've used this behaviour for years with joins/lookups to detect if there was a match -- I'd use a non-nullable column, declare it as nullable on the input link to the join stage and, of course, in the output link and later test with IsNull() in a transform stage. The only time it will be a null value is if the join/lookup didn't find a match.

Posted: Tue Oct 09, 2012 5:34 pm
by kaps
I could not figure out the explanation for the behaviour so as a workaround I checked the column from the right side file in stage variable using NullToZero function and then checked the stage variable in the column derivation using IsAlpha function. It worked :-)