Complex join problem

phillip.small · Post by **phillip.small** » Tue Aug 17, 2004 9:56 am

Input File (8 keys) -------> Hash file (link to 6 keys) -------pull Position Nbr based on key joins --------> TARGET ODBC DB2 table.

What is happening is that the join from the other columns are not displaying in the HASH file object unless I disable the primary keys, however, when that is done then the join will not work. The log file gives an error stating that the "Position Number is not valid in the context in which it is used." I have trimmed everything. I changed the date fields to Char to avoid formatting issues. The position number has no link from the Input File and is first being generated from the Hash. Is that a problem? How can I format the job to display all records and return the Position Nbr based on the join from the input file (sequential) stage to the Hash file?

aaronej · Post by **aaronej** » Tue Aug 17, 2004 10:12 am

Check the position column on your hash file table definition. The key columns i nthe hash file cannot have a value other than 0. This may be the cause of your issue.

Good luck!

Aaron

chucksmith · Post by **chucksmith** » Tue Aug 17, 2004 10:13 am

Is the primary key of your hash file composed of all six columns, no more and no less?

In your job, all six lookup key columns should have derivations. Key columns should not be disabled.

You noted the string Position Number in your error message. Is this a column name in your hash file? Does it include the space between the two words?

phillip.small · Post by **phillip.small** » Tue Aug 17, 2004 10:25 am

chucksmith wrote:Is the primary key of your hash file composed of all six columns, no more and no less?

In your job, all six lookup key columns should have derivations. Key columns should not be disabled.

You noted the string Position Number in your error message. Is this a column name in your hash file? Does it include the space between the two words?

The column name is POSITION_NBR. I have no derivations for the POSITION_NBR column because it is first being generated there and not from the data file. I am using the other fields from the data file to join to the hash to send the correct POSITION_NBR to the target table for each record.

sumitgulati · Post by **sumitgulati** » Tue Aug 17, 2004 10:59 am

How many key columns did you define while creating the hash file. If it is 8 then you have to give a join for all the 8 key columns.
If while creating the hash file you gave 8 key columns and now you are using 6 of them for a join condition you will get very unexpected and random results. What you need to do is either recreate (Delete and create again) the hash file with 6 key columns or create a new hash file with 6 keys and use it in the lookup.

Regards,
Sumit

phillip.small · Post by **phillip.small** » Tue Aug 17, 2004 12:38 pm

This is the query that I am trying to emulate in DB2. The problem is the date columns. The 2nd query in the union uses the DOC_TERM_DT column but it will only use that one if the DOC_TERM_DT is the MAX(DOC_TERM_DT) and equal to the other other key columns as seen below. How do I do this in the transformer or will two transformers need to be created? Also how do you pull the MAX date in DataStage. I looked in the Help and there is no MAX function.

select
LTRIM(RTRIM(TO_CHAR(DOC_EFF_DT, 'YYYYMMDD')))||'|'|| -- effdt
'A' ||'|'|| -- eff_status
to_char(DOC_TERM_DT, 'YYYYMMDD')||'|'||
RTRIM(AUDC_PARA_NR)||'|'||
RTRIM(AUDC_LINE_NR)||'|'||
TOTAL_ARMY_CLAS_CD||'|'||
RQMT_RNK_NR||'|'||
RTRIM(UIC_CD)||'|'||
UNIT_STRUC_SRCE_CD||'|'||
to_char(DOC_EFF_DT, 'YYYYMMDD')||'|'||
POSN_RQMT_CD||'|'||
RQRD_STR_QY||'|'||
AUTH_STR_QY
from taads_detail_rmks_t
UNION
select
LTRIM(RTRIM(TO_CHAR(DOC_TERM_DT, 'YYYYMMDD')))||'|'|| -- effdt
'I' ||'|'|| -- eff_status
to_char(DOC_TERM_DT, 'YYYYMMDD')||'|'||
RTRIM(AUDC_PARA_NR)||'|'||
RTRIM(AUDC_LINE_NR)||'|'||
TOTAL_ARMY_CLAS_CD||'|'||
RQMT_RNK_NR||'|'||
RTRIM(UIC_CD)||'|'||
UNIT_STRUC_SRCE_CD||'|'||
to_char(DOC_EFF_DT, 'YYYYMMDD')||'|'||
POSN_RQMT_CD||'|'||
RQRD_STR_QY||'|'||
AUTH_STR_QY
from taads_detail_rmks_t T
where T.DOC_EFF_DT = (select MAX(X.DOC_EFF_DT) from taads_detail_rmks_t x
where x.uic_cd = t.uic_cd and
x.audc_para_nr = t.audc_para_nr and
x.audc_line_nr = t.audc_line_nr and
x.rqmt_rnk_nr = t.rqmt_rnk_nr)

chucksmith · Post by **chucksmith** » Tue Aug 17, 2004 3:47 pm

Please help me with the relationship between tables T and t. T seems to be all of the rows that are effective on the maximum effective date of the current row from table t.

A MAX is implemented in DataStage in Basic:

Code: Select all

If a > b Then b Else a

Perhaps when I am clear on your union, I can offer some design suggestions, but there is no reason you could not create a hash file to hold the values associated with the maximum effective date. You may have to create a job that does a reference lookup and a write to the same hash file, using a constraint similar to the MAX implementation, above.

chucksmith · Post by **chucksmith** » Tue Aug 17, 2004 3:48 pm

Oops,

Code: Select all

If a > b Then a Else b

chucksmith · Post by **chucksmith** » Tue Aug 17, 2004 3:48 pm

Oops,

Code: Select all

If a > b Then a Else b

vmcburney · Post by **vmcburney** » Wed Aug 18, 2004 1:19 am

Unless I am very much mistaken, and it wouldn't be the first time today, you are trying to retrieve every row from taads_detail_rmks_t in the first union and you are trying to retrieve the most recent row from taads_detail_rmks_t in the second half of the union. This gives you duplicate rows where the first instance of the row has status set to 'A' and in the second instance it is set to I.

What I suggest you do is run a job that dumps all maximum dates into a hash file using one DataStage job. The select statement against the database will group by the primary key fields and take the maximum value of DOC_EFF_DT.

A second job selects all rows from the table, it performs a lookup of the maximum date hash file to identify whether that row matches a maximum date lookup row, the eff_status field is then set to "A" when a lookup is not found or "I" when a lookup is found.

I think this is more maintainable then your long custom SQL with the duplicate union rows.

chulett · Post by **chulett** » Wed Aug 18, 2004 6:47 am

Vincent, a 'union' by definition removes all duplicates. You have to specifically say 'union all' to retain them.

vmcburney · Post by **vmcburney** » Wed Aug 18, 2004 6:19 pm

That is true, however in this case the duplicate rows are not removed by the union due to the status field, it has been set to A for one and I for the other so these maximum rows get returned down both paths of the union creating a row duplication that may not be intended.

chulett · Post by **chulett** » Wed Aug 18, 2004 7:48 pm

Ahhh... then they're not really duplicates, are they?

I agree with and understand your basic premise, just having some fun with ya, V.