lookup on variable keys

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
r
Participant
Posts: 13
Joined: Tue May 17, 2005 6:16 am

lookup on variable keys

Post by r »

hi,

Please help me out.
i have a reference file of the following type.

1st_char f1 f2 f3 f4 f5 f6 f7 prod_code
D - - - - - N - RB
D - - - - - - - RD
A - - - - - - Y RJ
A N Y - N - - - RH
A - Y - - N Y - RF
W - - - N - N Y RK
.
.
.

there is no unique key. based on the 1st char & the values of the flags, the prod_code is to be populated. "-" implies that the flag's value does not matter. it can be anything Y/N.

I have a primary file which contains all the above fields & in which prod_code is to be populated.

Both files are large.
Currently we are hardcoding the flag values to populate the prod_code using if-else statements.

i want to make this configurable to avoid code changes when new prod_codes are added.
a simple lookup wont work because we dont know which keys to lookup on.

if join is performed then there will be n.m iterations where n=no.of rows in primary file & m=no. of rows in reference file which will turn into millions of records.

Can anyone suggest an efficient solution for this.
To simplify-the problem is to perform lookup on variable keys.


Thanks & Regards
r
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Re: lookup on variable keys

Post by kumar_s »

based on the 1st char & the values of the flags, the prod_code is to be populated.

I have a primary file which contains all the above fields & in which prod_code is to be populated.

Both files are large.

Currently we are hardcoding the flag values to populate the prod_code using if-else statements.
From all above points, i guess you can make the join efficiently. Since the duplicates is only based on the first char, you can make the join based on the first char along with the flag. Just do a pre process to convert all '-' as Y if necessary.

-Kumar
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: lookup on variable keys

Post by kwwilliams »

I think that this problem is deceptively difficult. With D the only set "key value" that you would have for a lookup you aren't going to be able to use a lookup or a join to produce the output for this. With seven different flags you would need to create too many variations of the file to be very effecient in producing a recordset that you could then use for a lookup or join.

The only real solution that I see is to perform a join based upon the first character. This would create a cartesian product.

You file has the following lines:

D -----N- RB
D ------- RD

If you had a data stream with a D in it and joined to this file your output
would be both of these lines for the one input (the Y and N represent actual flags from your data source) or

D -----N- RB YYYYYNN
D ------- RD YYYYYNN

Then in a transformer you could have a constraint on the output that imbeds all of the logic that you need.

(prod_flag1 = 'Y' and (f1 = 'Y' or '-'))
or (prod_flag1 = 'N' and (f1 = 'N' or '-'))

Emulate that for all seven pro flags. I do see a problem with the logic though. If you look at the example that I used above, you will see that both of these lines would pass the constraint logic. So would kind of business rule do you have in place to ensure that the correct prod_code is used when multiple entries in this file satisfy the logical requirements?
Post Reply