full outer join (returning defaults)

mystuff · Post by **mystuff** » Sun Jul 25, 2010 8:13 pm

Hi,

I am using FullOuterJoin in my job. All its inputs from start of the stream (this is the first job in the stream) have been kept as nullable (Yes). When I try to join based on Key fields say KeyField_1, KeyField_2. I am getting default values instead of nulls when no match is found.

I managed to implement it once to obtain nulls if match is not found instead of default values. But not able to recollect how to do it. Can anyone help.

Thanks

ray.wurlod · Post by **ray.wurlod** » Sun Jul 25, 2010 8:25 pm

Remove the default values from the record schema. I assume you've used different names to receive the key values from left and right inputs.

mystuff · Post by **mystuff** » Sun Jul 25, 2010 8:29 pm

I have checked whether the record schema has default values, but it doesn't have any default values (Neither provision for providing default values, I pressed right click on the column def and selected Edit Row).

For the KeyFields I have used the same names. Full Outer Join is creating two fields (leftRec_keyfield_1 & rightRec_keyfield_2) in output for each Key field

ray.wurlod · Post by **ray.wurlod** » Mon Jul 26, 2010 12:07 am

Those default values have to be coming from somewhere! Check all upstream stages.

madhukar · Post by **madhukar** » Mon Jul 26, 2010 12:12 am

check all stages from input till join for defaults and nullability

kumar_s · Post by **kumar_s** » Mon Jul 26, 2010 1:41 am

Hope you are not just doing the Veiw data, which might some times trick you with the default values with the given stage properties.
If you are sure that, the defaults are not from source stages, Peek it right after the Join.

mystuff · Post by **mystuff** » Mon Jul 26, 2010 5:26 am

Hi,

ray.wurlod wrote: Those default values have to be coming from somewhere! Check all upstream stages.

when I go into the records schema definition for all the upstream stages, they show up as "

Code: Select all

No schema properties can be specified at this point

"

kumar_s wrote:Hope you are not just doing the Veiw data, which might some times trick you with the default values with the given stage properties.
If you are sure that, the defaults are not from source stages, Peek ...

Those default values are actually getting created, as they are creating issues with my downstream jobs.

ray.wurlod · Post by **ray.wurlod** » Mon Jul 26, 2010 6:52 am

Try setting the environment variable that prints the record schema used by each operator into the log. From memory it's OSH_PRINT_SCHEMAS (but please do check).

mystuff · Post by **mystuff** » Wed Jul 28, 2010 8:24 pm

ray.wurlod wrote:Try setting the environment variable that prints the record schema used by each operator into the log. From memory it's OSH_PRINT_SCHEMAS (but please do check). ...

Hi Ray I tried to enable OSH_PRINT_SCHEMAS by making it TRUE, but can't find it in director log. I have gone through the logs, but wasn't able to check the metadata definitions.

Is there anywhere in particular that I need to search.

mystuff · Post by **mystuff** » Wed Jul 28, 2010 8:25 pm

ray.wurlod wrote:Try setting the environment variable that prints the record schema used by each operator into the log. From memory it's OSH_PRINT_SCHEMAS (but please do check). ...

Hi Ray I tried to enable OSH_PRINT_SCHEMAS by making it TRUE, but can't find it in director log. I have gone through the logs, but wasn't able to check the metadata definitions.

Is there anywhere in particular that I need to search.

ray.wurlod · Post by **ray.wurlod** » Wed Jul 28, 2010 8:36 pm

It's an environment variable that exists in the Administrator client, in the Reporting group. You can add it to a job as a parameter using "Add Environment Variable" in the Parameters page of the job properties dialog. Once it's there leave its default value as False and only set it to True on those runs where you need it.

mystuff · Post by **mystuff** » Fri Jul 30, 2010 6:47 am

Hi Ray,

I have enabled the OSH_PRINT_SCHEMA and tried to look in the logs. But the log doesn't give schema information of each link. It just gives info about one of the inputs (a dataset) and not for the other inputs (teradata enterprise).

I am looking at the OSH Script in the director log. Please let me know if I am not looking at the right place.

ArndW · Post by **ArndW** » Fri Jul 30, 2010 6:49 am

If you activated $OSH_PRINT_SCHEMAS then you will have a log entry entitled "main_program: Schemas: (...)." and that will contain the schema desciriptions for each and every runtime link (n.b. that the optimizer might combine some stages)

mystuff · Post by **mystuff** » Tue Aug 03, 2010 4:51 am

Hi,

I have looked at the schema's. While unloading from teradata, the schema is defined as Not Nullable for few columns, even though I have specified them as nullable in metadata. This got carried forward to most of the stages until a transformer is used.

Is there a best way to change it without using transformer. Modify stage is as well not allowing to change them to nullable.

ArndW · Post by **ArndW** » Tue Aug 03, 2010 5:51 am

The modify stage should do this. What happens when you change the nullability from not-nullable to nullable in the output link in a modfy stage?