Hi everybody,
I have a big dimension ( 5.000.000 rows)(2 fields: A_SurrogateKey, A_ID) that I have to lookup to get foreign keys for 3 fields( B_ID, C_ID, D_ID) from the stream data (300.000 rows).
The lookup keys are (A_ID, B_ID), (A_ID, B_ID), (A_ID, C_ID).
I am looking for the best solution.
Sparse Lookup?
Three Merge stages? ( with three reparations of data?)
Thanks for any suggestion!
Ovidiu
big dimension ...if comparing with stream data
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
At first glance one Merge stage with three Update links ought to do the job. Sparse lookup might be indicated if you are certain that the number of distinct key values in the stream input will be relatively small.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
OK, misunderstood your requirement. The big table is the reference data, is that correct? It contains columns A_ID, B_ID, C_ID. What happened to data set D? Does it have a key? Your requirement really isn't clear. Let me take a guess.
- Lookup a row from stream B against A.
Lookup a row from stream C against A.
Lookup a row from stream D against A. What is the key here?
Then what happens? Merge stage or Join stage will let you capture the failed lookups (use outer join in Join stage with nullable output columns derived from the right link). So will Lookup stages if A will fit in memory. Five million isn't necessarily a lot of memory, if they're short rows.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.