Page 1 of 1

How to join data with a BETWEEN function

Posted: Tue Jul 14, 2015 12:29 am
by dveltman1
Hi all,

I would like to left outer join data from a 12million+ rows table onto my stream data, with a DATE from the stream into START_DATE and END_DATE in the 12m+ table.

select A.*, B.*
from A, B
left outer join B on A.key = B.key
where A.DATE >= B.START_DATE
and A.DATE < B.END_DATE

In a Join and Merge stage I cannot add any more constraints.
In a Lookup stage I could not join the data, only lookup.

1. What is the best way to get the 12m+ data? Connector/Lookup_file/...
2. How to join with a point in between 2 dates? Except SQL overwrite.

Thanks!

Posted: Tue Jul 14, 2015 1:27 am
by ArndW
The LOOKUP stage will do the range lookup. How many bytes per row on the 12Mio reference?

Posted: Tue Jul 14, 2015 2:14 am
by dveltman1
Columns: 24
Bytes Used: 2.199 GB
Bytes Allocated: 2.227 GB
Skew: 0.010000
Row Count: 18,023,892
% Organized: -
Distribution Keys:
Distribution Key Count:

Posted: Tue Jul 14, 2015 2:22 am
by ArndW
That should be no problem at all for a lookup. Are you using all 24 columns as keys or data in the lookup? If not, remove any unused columns in the SELECT to reduce your data volume. Sort the data in the SELECT and use the range lookup options in the lookup stage.

Posted: Thu Jul 16, 2015 3:52 am
by dveltman1
I was trying to get the 12mio rows with a Lookup_file. But I can't use it for a range lookup when I'm joining with a Connector stage.
I cannot get the right input link on stream so I can define the range.
Any idea's?

Posted: Thu Jul 16, 2015 6:29 am
by ArndW
I have to admit I don't understand quite what you are saying.

Use a LOOKUP stage in order to specify range lookups.

Posted: Fri Jul 17, 2015 8:22 am
by dveltman1
Source 1 is connector stage and stream.
Source 2 is Lookup_File and reference.

The stream needs to do a range lookup to the Lookup_File, with the lookup stage.

But I can't perform the range lookup with that combination.
It does work with 2 connector stages, but I would like to use the Lookup_file because I would like to load it once and re-use it in other jobs.

Posted: Fri Jul 17, 2015 8:51 am
by ArndW
Do you mean a database connector stage? And a lookup fileset stage for the lookup? If not, could you post what the data source and exact stage type for the reference link to the lookup stage is?

Posted: Fri Jul 17, 2015 7:42 pm
by ray.wurlod
dveltman1 wrote:But I can't perform the range lookup with that combination.
You should be able to; the range lookup is performed against a virtual data set loaded into memory no matter what the supporting stage type is (except, possibly, Lookup File Set stage).