Page 1 of 1

Range lookup

Posted: Mon Jun 28, 2010 4:28 pm
by vskr72
I am trying to build a range lookup. This is done in the stream. I have 3 different range key expressions specified. 2 of them are based on dates and another is based on an integer number.

Having a weird issues. When I try to perform the lookup with just few records (like less than 50), the lookup performs well and I get proper results. But, if I do not have a filter and I let all the records load up the lookup (approx 500K), then the results are weird and wrong. Is there anything that I need to set or specify in the job? Thanks,

Satish

Posted: Mon Jun 28, 2010 5:33 pm
by chulett
We'd need much more in the way of specifics before anyone could help, there's not much to work with when you just say the "results are weird and wrong". :?

Posted: Mon Jun 28, 2010 7:15 pm
by ray.wurlod
As you're providing the greater detail that Craig requested, please also advised how the data are partitioned on every input link to the Lookup stage.

Posted: Mon Jun 28, 2010 9:28 pm
by vskr72
I should have provided this earlier:

Code: Select all

 Stream Input: A
Case, Emp, Action_DT
-------------------------
A100, 100, 1/1/2010
A100, 101, 2/1/2010

Reference Input: B
SK, EMP,   E_ST_DT, E_END_DT, O_ST_DT, O_END_DT  
-------------------------------------------------------------
1,  100,  6/1/2009, 4/1/2010, 4/1/2008, 6/1/2010
2,  100,  6/1/2009, 4/1/2010, 4/1/2006, 4/1/2008

For the above datasets, the following are the conditions that I apply:

- A.EMP=B.EMP
- ACTION_DT>= E_ST_DT AND ACTION_DT<= E_END_DT
- ACTION_DT>= O_ST_DT AND ACTION_DT<= O_END_DT

When I just filter the lookup data to EMP=100 (for my testing), I get the SK value as 1 which is correct.

Code: Select all

Reference Input: B
SK, EMP,   E_ST_DT, E_END_DT, O_ST_DT, O_END_DT  
-------------------------------------------------------------
1,  100,  6/1/2009, 4/1/2010, 4/1/2008, 6/1/2010
2,  100,  6/1/2009, 4/1/2010, 4/1/2006, 4/1/2008
3,  200,  6/1/2009, 4/1/2010, 4/1/2008, 6/1/2010
4,  200,  6/1/2009, 4/1/2010, 4/1/2006, 4/1/2008

But, when I include, 2 more records that belong to a differnt EMP like 200, then for the same stream input, I am getting the SK value as 3. I am completely lost in this range lookup deal.

I am using the default 1 node config file and all other partitioning settings are standard. Thanks.

Posted: Tue Jun 29, 2010 2:00 am
by ArndW
That is a much better explanation, but you are missing the values for employee 200 in the reference stream - you can omit the employee 100 values since you stated that those work correctly. At least by running with 1 node you eliminated partitioning as the source of you problem.

Posted: Tue Jun 29, 2010 7:01 am
by chulett
I don't know, I got the impression that the input stream only provides the correct information for EMP 100 when there were only records for EMP 100 in the lookup data. Once more rows are introduced into the lookup source for EMP 200, the results for EMP 100 are now wrong. :?

Is that what I'm reading?

Posted: Tue Jun 29, 2010 7:48 am
by vskr72
Yes Craig. Thats exactly what is hapenning.

Posted: Tue Jun 29, 2010 8:44 am
by priyadarshikunal
I think you haven't got any boundry condition for emp = 100 for which you are getting the correct results. and when ever it encounters any such condition its giving you incorrect results. Test it for the emp for which you are getting incorrect values.

Posted: Tue Jun 29, 2010 9:36 am
by vskr72
Kunal - Can you pls clarify what you said? I did not understand u.

Posted: Tue Jun 29, 2010 10:00 am
by priyadarshikunal
what if you get a value for emp 100 like


A100, 100, 4/1/2008

Posted: Sat Oct 16, 2010 5:59 am
by agpt
Hi,

you have marked this topic resolved. Could you please update us how exactly you solved this?

Posted: Sat Oct 16, 2010 5:17 pm
by ray.wurlod
vskr72 wrote:Kunal - Can you pls clarify what you said? I did not understand u.
Please strive for a professional standard of written English on DSXchange. English is difficult enough for people whose first language is something else. SMS-style abbreviations are completely unnecessary - DSXchange is not a mobile telephone and there is no character limit on the length of post.