big dimension ...if comparing with stream data

ovidiu · Post by **ovidiu** » Thu Oct 26, 2006 4:30 pm

Hi everybody,

I have a big dimension ( 5.000.000 rows)(2 fields: A_SurrogateKey, A_ID) that I have to lookup to get foreign keys for 3 fields( B_ID, C_ID, D_ID) from the stream data (300.000 rows).
The lookup keys are (A_ID, B_ID), (A_ID, B_ID), (A_ID, C_ID).
I am looking for the best solution.
Sparse Lookup?
Three Merge stages? ( with three reparations of data?)

Thanks for any suggestion!

Ovidiu

ray.wurlod · Post by **ray.wurlod** » Thu Oct 26, 2006 10:39 pm

At first glance one Merge stage with three Update links ought to do the job. Sparse lookup might be indicated if you are certain that the number of distinct key values in the stream input will be relatively small.

ovidiu · Post by **ovidiu** » Fri Oct 27, 2006 11:03 am

Thanks for your suggestion Ray,

I don't know if I can solve the problem with only one Merge Stage...since the master link should be the stream data ( I need also the rejects). In this case I have three master links (for B, C and D key fields).

Thank you,

Ovidiu

ray.wurlod · Post by **ray.wurlod** » Fri Oct 27, 2006 11:12 am

OK, misunderstood your requirement. The big table is the reference data, is that correct? It contains columns A_ID, B_ID, C_ID. What happened to data set D? Does it have a key? Your requirement really isn't clear. Let me take a guess.

Lookup a row from stream B against A.

Lookup a row from stream C against A.

Lookup a row from stream D against A. What is the key here?

Then what happens? Merge stage or Join stage will let you capture the failed lookups (use outer join in Join stage with nullable output columns derived from the right link). So will Lookup stages if A will fit in memory. Five million isn't necessarily a lot of memory, if they're short rows.

ovidiu · Post by **ovidiu** » Fri Oct 27, 2006 11:23 am

The big table has two fields: A_SurrogateKey, A_ID .
In the stream data I have 3 fields B_ID, C_ID, D_ID . All of them are looking up A_ID in order to get A_SurrogateKey.

Thanks,
Ovidiu

ovidiu · Post by **ovidiu** » Fri Oct 27, 2006 11:27 am

I will try join stage...you are right.

Thanks,

Ovidiu