Parallel JOIN Stage - LEFT Outer join

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rcanaran
Premium Member
Premium Member
Posts: 64
Joined: Wed Jun 14, 2006 3:51 pm
Location: CANADA

Parallel JOIN Stage - LEFT Outer join

Post by rcanaran »

I have 3 files. (A, B and C). A is th driver and I do two outer joins in the join stage. The JOIN stage requires that the join columns be identically named. From the JOIN STAGE, STAG tab, PROPERTIES tab,PROPERTIESIn the columns available for target for the key , the help is

"Type: Input Column
Name of input column you want to join on. Columns with the same name must appear in both input data sets and have compatible data types."

In the OUTPUT tab there is only one source column for the join key and I cannot edit the derivation, only drag source to target.
the following transformer stage can only access the output link from the join, so only has ONE key and ONE streamed input. Can it reference the INPUT links of the JOIN stage?

How do I determine the the "NULL" values for the missing rows? I tried ISNULL on a numeric column from the right side, but it looks like Paralle is interpreting this as zeroes, not null.
panchusrao2656
Charter Member
Charter Member
Posts: 64
Joined: Sat Sep 17, 2005 10:42 am

Post by panchusrao2656 »

change the metadata of the field you are extracting from Jion to be NULLABLE YES, then you get NULL whenever match not found. If the metadata nullability NO then it will return a default value based on the type of the field. In your case it might be returning zero as it is a decimal field
rcanaran
Premium Member
Premium Member
Posts: 64
Joined: Wed Jun 14, 2006 3:51 pm
Location: CANADA

Post by rcanaran »

Thanks. I tried setting all of the fields to NULLable for the tables on the right of the join. Compiled OK, but on execute complained that I hadn't set a default value for all the NULLable fields (one log entry per field).

Set deafult value (SeqFile Stage/Output/Format/FieldDefaults/NullFieldValue) to a 255 byte string.

Compiled OK, but on execute complains for each NULLable field that the value must be the exact length -- again one log entry per field. Cannot find a way to specify NULL value for each individual field.

Workaround1 ? - Null only a field that is normally NOT NULL, so null value is only the result of a failed join.

Workaround2 ? - directly load all data to staging tables and do the ETL to the real table with SQL.

Any other suggestions?
panchusrao2656
Charter Member
Charter Member
Posts: 64
Joined: Sat Sep 17, 2005 10:42 am

Post by panchusrao2656 »

We cannot write a NULL to a sequential file, we have to set a default value for NULL.

Case 1:Set Null Field Value to a single space ( which will result in a warning, we can ignore it)

Case 2: Set default value to each and every field, this is tricky.
a.Varachr --need to put the same number of spaces
b.Decimal --put two more spces to the actual lngth of the field
c. Timestamp -- 20 spaces ( i am not sure whether it is 20 or 22) -- as Timestamp is 18 character length, i believe it should be 18+2=20
You can avoid the warning also in the second case.
rcanaran
Premium Member
Premium Member
Posts: 64
Joined: Wed Jun 14, 2006 3:51 pm
Location: CANADA

Post by rcanaran »

In parallel, on a SF stage, I cannot seem to find where I can set the default value for a NULLable column. Only at the record/file level. When I get warnings only, I then get messages that a target field is NULL and the output record is being dropped. Effectively changes my left outer join to an inner join.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Click on the column name to activate it, then right mouse click and choose the "edit row" option, that brings up the window where you can set the specific column attributes.
Post Reply