Range lookup functionaility

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Range lookup functionaility

Post by Kryt0n »

We have a requirement to retrieve data for a record based on what was active at the given time. For example

Our input record has the fields Col1 and Timestamp as such

Code: Select all

Col1    Timestamp
A         2009-06-01 12:00:00
Now we need to do a lookup against another table with Col1, StartTS and EndTS

Code: Select all

Col1     StartTS                           EndTS
A         2009-05-12 09:12:12       2009-05-18 03:12:53
A         2009-05-18 03:12:54       2009-06-02 10:10:23
A         2009-06-02 10:10:24       <null> (we set this to 2999-12-31... for the sake of the lookup)
Now for the given input, we would expect to take the second row. To implement this, we have set the Col1 on reference link to a Key and the Timestamp on primary link to a range lookup (>= StartTs, <= EndTs)

Job runs, but we get the error that duplicates are found and they will be ignored. I have confirmed there are no duplicates for the Col1, StartTS combination, as such, these duplicates must be on the Col1 and therefore it is only taking one row for each Col1 value.

This is confirmed with output not always setting the reference link columns even when there is an entry with the correct timestamp range for the given Col1.

Anyone dealt with such a requirement and got it working in DataStage? If so, were you working with DB stages or did you have it working from flatfiles/datasets? Plan to re-try on a DB stage but still awaiting the DB setup (things get done pretty quick over here!)
vkhandel
Participant
Posts: 35
Joined: Wed Oct 04, 2006 12:12 am
Location: Pune

Re: Range lookup functionaility

Post by vkhandel »

The warning is thrown because we have duplicates in reference link for "Col1". I have also faced the same problem, and as a workaround, I have made an inner join of the two links, thereafter filtering them on the condition

Timestamp >= StartTS and Timestamp <= EndTS.
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Yeah, that is what I figured, doing an inner join on Col1 won't be practical for us as we are talking millions of records in each table...

I spent a lot of time yesterday trying to figure out how I could make the range lookup work with no success. By stating the reference link has multiple rows does remove the duplicate reporting but throws a new error - Fatal Error: Updatable lookup requires sorted keys - whatever that means (tried sorting on the keys but didn't make a difference to the error). For now I will take it all in to the database.

Find it hard to believe they didn't think somone might want to use the range lookup in this way so hope it is just something I am missing!
vkhandel
Participant
Posts: 35
Joined: Wed Oct 04, 2006 12:12 am
Location: Pune

Post by vkhandel »

"By stating the reference link has multiple rows does remove the duplicate reporting" ... You achieve that by selecting the reference link in the property "Multiple Rows returned from Link" , right?

I tried that and warning is removed in my case too .. and strangely I have not encountered any Fatal Error too ....

Have you partitioned the reference link as "Entire"? I am using that partition method, and its working fine for me.
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Hmmm, lucky you!!

Yes, I was referring to the "Multiple Rows returned from Link"... gave Entire a crack but still no luck.

What UNIX system are you using? And what version of the C++ compiler? I'm on Solaris... would try find out about what C++ compiler but thanks to the super-system they have implemented here, can't access the box at present...

I've implemented the Oracle way around as the project can't wait but would love to know where this is going wrong!
Post Reply