An ODD Lookup, please help

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kumarjit
Participant
Posts: 99
Joined: Fri Oct 12, 2012 7:47 am
Location: Kolkata

An ODD Lookup, please help

Post by kumarjit »

My job's input data consists of the following six mandatory columns:
NAME
LOCATION
RCT_CODE
TXN_DT
BANNER_NAME
POOL_NUMBER

The input data needs to be validated( exact word should have been LOOKED UP) against two reference data sources :

REF1 :
P1_NAME(column against which the input NAME column of the input field is to be matched)
P1_LOC(column against which the input LOCATION column of the input field is to be matched)
R_CODE_NUM(column against which the input RCT_CODE column of the input field is to be matched)

REF1 will also contain two DATE columns say, SRC1_D1 and SRC1_D2 against which the TXN_DT input column will be RANGE-look up'd.

A similar scenario is also applicable for the other Reference Data source REF2 , like :
P2_BANNAME(column against which the input BANNER_NAME column of the input field is to be matched)
P2_POOL(column against which the input POOL_NUMBER column of the input field is to be matched)

REF2 like REF1 , will also contain two DATE columns say, SRC2_D1 and SRC2_D2 against which the TXN_DT input column will be RANGE-look up'd.


Where I am stuck :
REF1 and REF2 can have multiple records where the respective EQUALITY conditions holds TRUE, but the TXN_DT value may either be out of the concerned
Date Range , or it may be within .

So , from both the reference links , I must be receiving records which will be satifying the lookup condistion , and also those which dont , and that too
for a single combination of EQUALITY keys.

Example :
For the same NAME+LOCATION+RCT_CODE combination , there can be mutiple records in the REF1, some having TXN_DT
within SRC1_D1 and SRC2_D2, and some wont.

Similarly , for the same BANNER_NAME+POOL_NUMBER , there can be mutiple records in the REF2 , some having TXN_DT
within SRC2_D1 and SRC2_D2 , and some wont.

Now , If I use a lookup, and in its constraint window , set Allow Multiple records for REF1 link , then I wont be getting the duplicates from REF2,
and vice-versa .

What I need ?
To get all records apssing through the lookup stage , if the record matches the lookup criterion , fair enough , if it does not it should not be dropped .



Which approach to follow ?


Thanks
Kumarjit.
Pain is the best teacher, but very few attend his class..
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Did you select look-up failure mode to continue?

If it doesn't wok you can anyways just look for key matches and apply range in transformer.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

You need to more clearly define what the end results you need are for certain input conditions:

1) You have an input record which has no match for either REF 1 or REF 2.
2) You have an input record which has a match for REF 1 but not on REF 2.
3) You have an input record which has a match for REF 2 but not for REF 1.
4) You have an input record which has a match for both REF 1 and REF 2.

If a single input record matches multiple rows in a reference input, do you need multiple output records created or not?

You may be better served using two separate lookup stages rather than just one, primarily so that you can more easily focus on a single reference input at a time.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

I remember reading somewhere that only one Multi row lookup was allowed PER lookup stage - so you may need 2 cascaded lookup stages with a Mandatory Copy operator ( FORCE=TRUE) in between them so that both lookup stages will not undergo operator combination. Draw an output link from copy to see your intermediate dataset.
That being said , maybe my source of information is outdated.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

There is a Combinabilty Mode option under the Advanced tab, no need to force a copy stage in between.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

What we need to think of is what we will be doing with the multiple records after the lookup.

Are you going to take the maximum of date. If there are overlapping timelines in the lookup data, guess it will not be meaningful and giving proper results to us. (It is totally a invalid data, at some point of time this will be junk if you continue to allow overlapping timeline)

Before doing the lookup, avoid the overlapping time line. try to acheive the one that you are going to do after lookup(with multiple records), at before lookup
Thanks,
Prasanna
kumarjit
Participant
Posts: 99
Joined: Fri Oct 12, 2012 7:47 am
Location: Kolkata

Post by kumarjit »

@priyadarshikunal : The lookup failuer options has been set to CONTINUE since start.

@jwiles : To asnwer your different permutations and combinations , I would take just a few steps to walk you though my problem.
As it has been mentioned earlier in my post , a mandatory column TXN_DT need to be validated against TWO DIFFERENT sources , namely REF1 and REF2
The keys on which REF1 is looked up by the stream data are NAME and LOCATION .
Lets say the Input stream a record containing NAME=N1 , LOCATION=L1 , RCT_CODE=R1 and TXN_DT=D1. Now if you lookup the REF1 source using keys N1, L1 and R1 ,
there may be thousands of records in it where P1_NAME=N1 , P1_LOC=L1 and R_CODE=R1 ,
but out of these there will be only ONE record for which the value of TXN_DT from the input stream (D1) will lie in the range
specified by values corresponding SRC_D1 and SRC_D2 columns of REF1.
I want ALL the recods corresponding to keys L1 , N1 and R1 to be propogated out of the lookup irrespective of whether the TXN_DT lies within the
range of the corresponding SRC_D1 and SRC_D2 columns values .
The only difference would be , the ones where TXN_DT fall within the range , TXN_DT value will be propogated , else NULL .

The same scenario is applicable while looking up the REF2 source , here the keys change , and so does the date range columns , but the overall logic
remians the same though.

I am seeking to implement this via a SINGLE lookup .

Any further help is truly appreciated.


Thanks
Kumarjit.
Pain is the best teacher, but very few attend his class..
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why are you "seeking to implement this as a SINGLE lookup"?

To me it does not make sense to do so; the sense of the logic is not present in such a design.

Therefore consider two Lookup stages or, perhaps, a single Lookup stage with two separate reference inputs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Thank you for the explanation...it does clear up the questions regarding the individual reference sets, but now leaves some more detailed requirements open:

1) Do you need to keep any records which don't match at all to Ref1 or Ref2? If yes, what is the output value for TXN_DT? Will it be nulled, or will it retain its value?
2) Do you need to know the level of match to a reference in order to determine the fate of TXN_DT? That is: Do you need to know that a Ref1 match was matde to NAME, LOCATION and RCT_CODE but not TXN_DT? OR, do you simply need to know that all four keys matched or not? (same questions for Ref2)
3) Say you match to all keys but the date on one reference, and all keys including the date on the other reference: What will happen to TXN_DT? Will it retain its value, or will it be nulled? Is there a preference depending upon which reference had the full match?
4) Assuming you match on all four keys on both references, TXN_DT should keep its value. Is this a correct understanding?

If the answer to #2 is that you need to know the level of match, then I think that you will likely need 2 lookups per reference: 1 to just the first three keys, and 1 to all four keys (you should be able to do this in a single lookup stage per reference).

I have to second Ray's question: WHY do you want to do this in a single lookup? I believe that the simpler solution is actually two separate lookups (one per reference), with each lookup simply setting one or two indicator bytes (depending on the answer to #2 above) to indicate whether or not a match was encountered. Allow all records, match or not, to be kept in the lookup outputs. Following the lookups, a transformer can then examine the match results and create the appropriate output result. Using the indicator byte approach, you may be able to do all in a single lookup followed with a transformer.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
kumarjit
Participant
Posts: 99
Joined: Fri Oct 12, 2012 7:47 am
Location: Kolkata

Post by kumarjit »

Lets say the input has only one record for which :

Code: Select all

NAME 		: N1
LOCATION 	: L1
RCT_CODE 	: R1
TXN_DT 		: D1
BANNER_NAME 	: B2
POOL_NUMBER 	: P2
and the reference data sources have data like :

Code: Select all

REF1 :
P1_NAME	P1_LOC	R_CODE	SRC1_D1	SRC1_D2
N1	L1     	R1	<Some value>	<Some value>
N1	L1     	R1	NULL	NULL
N1	L1     	R1	<Some value>	<Some value>
N1	L1     	R1	D0	D2                  <---- For this particular record all the keys match and TXN_DT=D1 lies between D0 and D2 as well, hence this record needs to be fetched from looking up the REF1 source
The same scenario is applicable for lookup of REF2 source .
How can the above be acheived ?


Thanks
Kumarjit
Pain is the best teacher, but very few attend his class..
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

I assume that you're not going to answer the questions you have been asked? :?

In each of your reference data streams, add a column containing an integer with the value 1 (for TRUE). A Column Generator or Transformer can do this. This column will be used as an indicator to show the result of a lookup (1 means a match, 0 or Null means no match...I recommend making this column non-nullable so that you end up with just 1 and 0 values). Make sure that you have one uniquely-named indicator column for each lookup you perform (for example, REF1_MatchInd, REF2_MatchInd and so on). In your lookups, add this column to the output by dragging it from the reference input. Make sure that the lookups keep all records.

After the lookups, use a transformer which will examine the value of the indicator columns. Example: If REF1_MatchInd = 1 and REF2_MatchInd = 0, you know you have a match to REF1 but not REF2 and can execute the logic to create the output you require.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply