DSXchange

Posted: **Wed Jul 28, 2010 2:52 pm**

Hi folks, there are many range lookup problems posted here, so I realize I am adding to a pile, but I haven't seen this particular issue resolved so I am posting it before submitting a ticket.

I have a Varchar Key and a date on the stream:

StreamKey
StreamDate

I have the same key, and a pair of dates, and a desired data column on the reference:
RefKey
RefDate1
RefDate2
RefData

I want the stream to get the RefData value when StreamKey = RefKey and StreamDate is between RefDate1 and RefDate2.

FYI, RefKey and RefDate1 uniquely define a row in the ref data, and none of the date ranges overlap. So for each StreamKey/StreamDate pair, only one match is possible in the reference. I have the reference link partitioned "entire."

I have tried the following:
1) Match the StreamKey to the RefKey on the reference lookup link, and do a range lookup on StreamDate on the stream link. I got the error.

2) I tried a range lookup on both the StreamKey and the StreamDate on the stream link using the strategy suggested by vskr72 here viewtopic.php?t=133588
(by duplicating RefKey in the reference link and treating both lookups as ranges on the stream link.)

I got the error. I got the error either way.

Any help is appreciated. At this point, I suppose the most general question is, does range lookup work at all when the range is specified on the stream link? Is there a known bug on Solaris?

Thanks,
Doug

Posted: **Wed Jul 28, 2010 3:26 pm**

You can join just the StreamKey and RefKey and bring all the columns downstream into a Transformer or Filter Stage and then apply the logic

StreamDate is between RefDate1 and RefDate2

That should work fine.

Posted: **Wed Jul 28, 2010 4:24 pm**

kris007 wrote:You can join just the StreamKey and RefKey and bring all the columns downstream into a Transformer or Filter Stage and then apply the logic
StreamDate is between RefDate1 and RefDate2
That should work fine.

Hi thanks. Joining the StreamKey and the RefKey requires sorting and repartitioning the stream. Also, it would create an explosion in row count. I would like to use a lookup. Is it correct to infer from your response that range lookups on the stream link don't work?

Posted: **Wed Jul 28, 2010 5:44 pm**

Sorry for misleading you. What I meant was to use LookUp Stage to join just the StreamKey and RefKey and then continue. Range lookups in Parallel jobs are done differently than server jobs and I couldn't figure out a way so far to do the range lookups on the stream link. I don't think it is possible.

Posted: **Wed Jul 28, 2010 11:23 pm**

Thanks for the clarification. I'm submitting a ticket to see what IBM has to say. In the meantime I think I'll try your idea of using the lookup to match the keys. Up until now, I've always had unique rows in the lookup, so I guess you're saying multiple rows from the reference link just lead to multiple rows on the output like a join would. Speaking of which, something else I tried was to go into the "multiple rows returned from link" area and place a condition in there on the date range. Of course that didn't work either as reference link items are not available in the expression there. Seems like that would have been a nice, intuitive answer. As it is, I can't think of a case where the expression there would be useful. Only being able to apply a constraint to the stream while multiple rows are being returned on the reference makes no sense to me.

Posted: **Fri Jul 30, 2010 12:07 pm**

Hi guys, I am trying this idea of doing a lookup on the key, and then using a filter to apply the date logic, but I don't think the lookup is working. If the reference returns more than one row, does it operate like a join, or does it ignore the duplicates? My rowcount going in is the same as the rowcount coming out. It should be much larger.

Thanks,
Doug

Posted: **Fri Jul 30, 2010 12:25 pm**

dougcl wrote:If the reference returns more than one row, does it operate like a join, or does it ignore the duplicates? Doug

If you select the option "multiple rows returned from link" it will work like a join stage. If you did not select that option the lookup stage will drop duplicate records based on key which in your case RefKey

My rowcount going in is the same as the rowcount coming out. It should be much larger.

Which link are you talking about here. The reference vs Output or Stream vs Output?

Posted: **Fri Jul 30, 2010 2:14 pm**

kris007 wrote: If you select the option "multiple rows returned from link" it will work like a join stage. If you did not select that option the lookup stage will drop duplicate records based on key which in your case RefKey

Oh duh. I see that. I had the drop down blank. It seems to be working now. thanks,
Doug

Posted: **Thu Aug 05, 2010 1:42 pm**

Hi folks, I think I have this solved. Apparently the error message is correct. If you are doing a range lookup, the reference link must be sorted on the lookup keys. Partition entire first, then sort.

Doug

DSXchange

Fatal Error: Updatable lookup requires sorted keys.

Fatal Error: Updatable lookup requires sorted keys.

Re: Fatal Error: Updatable lookup requires sorted keys.

Re: Fatal Error: Updatable lookup requires sorted keys.

Re: Fatal Error: Updatable lookup requires sorted keys.