Page 1 of 1

Range lookup functionaility

Posted: Sun Jul 05, 2009 8:27 pm
by Kryt0n
We have a requirement to retrieve data for a record based on what was active at the given time. For example

Our input record has the fields Col1 and Timestamp as such

Code: Select all

Col1    Timestamp
A         2009-06-01 12:00:00
Now we need to do a lookup against another table with Col1, StartTS and EndTS

Code: Select all

Col1     StartTS                           EndTS
A         2009-05-12 09:12:12       2009-05-18 03:12:53
A         2009-05-18 03:12:54       2009-06-02 10:10:23
A         2009-06-02 10:10:24       <null> (we set this to 2999-12-31... for the sake of the lookup)
Now for the given input, we would expect to take the second row. To implement this, we have set the Col1 on reference link to a Key and the Timestamp on primary link to a range lookup (>= StartTs, <= EndTs)

Job runs, but we get the error that duplicates are found and they will be ignored. I have confirmed there are no duplicates for the Col1, StartTS combination, as such, these duplicates must be on the Col1 and therefore it is only taking one row for each Col1 value.

This is confirmed with output not always setting the reference link columns even when there is an entry with the correct timestamp range for the given Col1.

Anyone dealt with such a requirement and got it working in DataStage? If so, were you working with DB stages or did you have it working from flatfiles/datasets? Plan to re-try on a DB stage but still awaiting the DB setup (things get done pretty quick over here!)

Re: Range lookup functionaility

Posted: Mon Jul 06, 2009 12:30 am
by vkhandel
The warning is thrown because we have duplicates in reference link for "Col1". I have also faced the same problem, and as a workaround, I have made an inner join of the two links, thereafter filtering them on the condition

Timestamp >= StartTS and Timestamp <= EndTS.

Posted: Mon Jul 06, 2009 3:04 pm
by Kryt0n
Yeah, that is what I figured, doing an inner join on Col1 won't be practical for us as we are talking millions of records in each table...

I spent a lot of time yesterday trying to figure out how I could make the range lookup work with no success. By stating the reference link has multiple rows does remove the duplicate reporting but throws a new error - Fatal Error: Updatable lookup requires sorted keys - whatever that means (tried sorting on the keys but didn't make a difference to the error). For now I will take it all in to the database.

Find it hard to believe they didn't think somone might want to use the range lookup in this way so hope it is just something I am missing!

Posted: Mon Jul 06, 2009 10:18 pm
by vkhandel
"By stating the reference link has multiple rows does remove the duplicate reporting" ... You achieve that by selecting the reference link in the property "Multiple Rows returned from Link" , right?

I tried that and warning is removed in my case too .. and strangely I have not encountered any Fatal Error too ....

Have you partitioned the reference link as "Entire"? I am using that partition method, and its working fine for me.

Posted: Mon Jul 06, 2009 11:17 pm
by Kryt0n
Hmmm, lucky you!!

Yes, I was referring to the "Multiple Rows returned from Link"... gave Entire a crack but still no luck.

What UNIX system are you using? And what version of the C++ compiler? I'm on Solaris... would try find out about what C++ compiler but thanks to the super-system they have implemented here, can't access the box at present...

I've implemented the Oracle way around as the project can't wait but would love to know where this is going wrong!