Complex join problem

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
phillip.small
Participant
Posts: 29
Joined: Mon Aug 16, 2004 8:12 am

Complex join problem

Post by phillip.small »

Input File (8 keys) -------> Hash file (link to 6 keys) -------pull Position Nbr based on key joins --------> TARGET ODBC DB2 table.

What is happening is that the join from the other columns are not displaying in the HASH file object unless I disable the primary keys, however, when that is done then the join will not work. The log file gives an error stating that the "Position Number is not valid in the context in which it is used." I have trimmed everything. I changed the date fields to Char to avoid formatting issues. The position number has no link from the Input File and is first being generated from the Hash. Is that a problem? How can I format the job to display all records and return the Position Nbr based on the join from the input file (sequential) stage to the Hash file?
aaronej
Participant
Posts: 31
Joined: Mon Aug 18, 2003 9:25 am

Post by aaronej »

Check the position column on your hash file table definition. The key columns i nthe hash file cannot have a value other than 0. This may be the cause of your issue.

Good luck!

Aaron
Last edited by aaronej on Tue Aug 17, 2004 10:13 am, edited 1 time in total.
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

Is the primary key of your hash file composed of all six columns, no more and no less?

In your job, all six lookup key columns should have derivations. Key columns should not be disabled.

You noted the string Position Number in your error message. Is this a column name in your hash file? Does it include the space between the two words?
phillip.small
Participant
Posts: 29
Joined: Mon Aug 16, 2004 8:12 am

Post by phillip.small »

chucksmith wrote:Is the primary key of your hash file composed of all six columns, no more and no less?

In your job, all six lookup key columns should have derivations. Key columns should not be disabled.

You noted the string Position Number in your error message. Is this a column name in your hash file? Does it include the space between the two words?
The column name is POSITION_NBR. I have no derivations for the POSITION_NBR column because it is first being generated there and not from the data file. I am using the other fields from the data file to join to the hash to send the correct POSITION_NBR to the target table for each record.
sumitgulati
Participant
Posts: 197
Joined: Mon Feb 17, 2003 11:20 pm
Location: India

Post by sumitgulati »

How many key columns did you define while creating the hash file. If it is 8 then you have to give a join for all the 8 key columns.
If while creating the hash file you gave 8 key columns and now you are using 6 of them for a join condition you will get very unexpected and random results. What you need to do is either recreate (Delete and create again) the hash file with 6 key columns or create a new hash file with 6 keys and use it in the lookup.

Regards,
Sumit
phillip.small
Participant
Posts: 29
Joined: Mon Aug 16, 2004 8:12 am

Post by phillip.small »

This is the query that I am trying to emulate in DB2. The problem is the date columns. The 2nd query in the union uses the DOC_TERM_DT column but it will only use that one if the DOC_TERM_DT is the MAX(DOC_TERM_DT) and equal to the other other key columns as seen below. How do I do this in the transformer or will two transformers need to be created? Also how do you pull the MAX date in DataStage. I looked in the Help and there is no MAX function.

select
LTRIM(RTRIM(TO_CHAR(DOC_EFF_DT, 'YYYYMMDD')))||'|'|| -- effdt
'A' ||'|'|| -- eff_status
to_char(DOC_TERM_DT, 'YYYYMMDD')||'|'||
RTRIM(AUDC_PARA_NR)||'|'||
RTRIM(AUDC_LINE_NR)||'|'||
TOTAL_ARMY_CLAS_CD||'|'||
RQMT_RNK_NR||'|'||
RTRIM(UIC_CD)||'|'||
UNIT_STRUC_SRCE_CD||'|'||
to_char(DOC_EFF_DT, 'YYYYMMDD')||'|'||
POSN_RQMT_CD||'|'||
RQRD_STR_QY||'|'||
AUTH_STR_QY
from taads_detail_rmks_t
UNION
select
LTRIM(RTRIM(TO_CHAR(DOC_TERM_DT, 'YYYYMMDD')))||'|'|| -- effdt
'I' ||'|'|| -- eff_status
to_char(DOC_TERM_DT, 'YYYYMMDD')||'|'||
RTRIM(AUDC_PARA_NR)||'|'||
RTRIM(AUDC_LINE_NR)||'|'||
TOTAL_ARMY_CLAS_CD||'|'||
RQMT_RNK_NR||'|'||
RTRIM(UIC_CD)||'|'||
UNIT_STRUC_SRCE_CD||'|'||
to_char(DOC_EFF_DT, 'YYYYMMDD')||'|'||
POSN_RQMT_CD||'|'||
RQRD_STR_QY||'|'||
AUTH_STR_QY
from taads_detail_rmks_t T
where T.DOC_EFF_DT = (select MAX(X.DOC_EFF_DT) from taads_detail_rmks_t x
where x.uic_cd = t.uic_cd and
x.audc_para_nr = t.audc_para_nr and
x.audc_line_nr = t.audc_line_nr and
x.rqmt_rnk_nr = t.rqmt_rnk_nr)
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

Please help me with the relationship between tables T and t. T seems to be all of the rows that are effective on the maximum effective date of the current row from table t.

A MAX is implemented in DataStage in Basic:

Code: Select all

If a > b Then b Else a
Perhaps when I am clear on your union, I can offer some design suggestions, but there is no reason you could not create a hash file to hold the values associated with the maximum effective date. You may have to create a job that does a reference lookup and a write to the same hash file, using a constraint similar to the MAX implementation, above.
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

Oops,

Code: Select all

If a > b Then a Else b
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

Oops,

Code: Select all

If a > b Then a Else b
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Unless I am very much mistaken, and it wouldn't be the first time today, you are trying to retrieve every row from taads_detail_rmks_t in the first union and you are trying to retrieve the most recent row from taads_detail_rmks_t in the second half of the union. This gives you duplicate rows where the first instance of the row has status set to 'A' and in the second instance it is set to I.

What I suggest you do is run a job that dumps all maximum dates into a hash file using one DataStage job. The select statement against the database will group by the primary key fields and take the maximum value of DOC_EFF_DT.

A second job selects all rows from the table, it performs a lookup of the maximum date hash file to identify whether that row matches a maximum date lookup row, the eff_status field is then set to "A" when a lookup is not found or "I" when a lookup is found.

I think this is more maintainable then your long custom SQL with the duplicate union rows.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Vincent, a 'union' by definition removes all duplicates. You have to specifically say 'union all' to retain them. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

That is true, however in this case the duplicate rows are not removed by the union due to the status field, it has been set to A for one and I for the other so these maximum rows get returned down both paths of the union creating a row duplication that may not be intended.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ahhh... then they're not really duplicates, are they? :P

I agree with and understand your basic premise, just having some fun with ya, V. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply