big dimension ...if comparing with stream data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ovidiu
Premium Member
Premium Member
Posts: 22
Joined: Tue Dec 20, 2005 7:14 pm

big dimension ...if comparing with stream data

Post by ovidiu »

Hi everybody,

I have a big dimension ( 5.000.000 rows)(2 fields: A_SurrogateKey, A_ID) that I have to lookup to get foreign keys for 3 fields( B_ID, C_ID, D_ID) from the stream data (300.000 rows).
The lookup keys are (A_ID, B_ID), (A_ID, B_ID), (A_ID, C_ID).
I am looking for the best solution.
Sparse Lookup?
Three Merge stages? ( with three reparations of data?)


Thanks for any suggestion!

Ovidiu
ovidiu
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

At first glance one Merge stage with three Update links ought to do the job. Sparse lookup might be indicated if you are certain that the number of distinct key values in the stream input will be relatively small.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ovidiu
Premium Member
Premium Member
Posts: 22
Joined: Tue Dec 20, 2005 7:14 pm

Post by ovidiu »

Thanks for your suggestion Ray,

I don't know if I can solve the problem with only one Merge Stage...since the master link should be the stream data ( I need also the rejects). In this case I have three master links (for B, C and D key fields).

Thank you,

Ovidiu
ovidiu
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

OK, misunderstood your requirement. The big table is the reference data, is that correct? It contains columns A_ID, B_ID, C_ID. What happened to data set D? Does it have a key? Your requirement really isn't clear. Let me take a guess.
  • Lookup a row from stream B against A.

    Lookup a row from stream C against A.

    Lookup a row from stream D against A. What is the key here?

    Then what happens? Merge stage or Join stage will let you capture the failed lookups (use outer join in Join stage with nullable output columns derived from the right link). So will Lookup stages if A will fit in memory. Five million isn't necessarily a lot of memory, if they're short rows.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ovidiu
Premium Member
Premium Member
Posts: 22
Joined: Tue Dec 20, 2005 7:14 pm

Post by ovidiu »

The big table has two fields: A_SurrogateKey, A_ID .
In the stream data I have 3 fields B_ID, C_ID, D_ID . All of them are looking up A_ID in order to get A_SurrogateKey.

Thanks,
Ovidiu
ovidiu
ovidiu
Premium Member
Premium Member
Posts: 22
Joined: Tue Dec 20, 2005 7:14 pm

Post by ovidiu »

I will try join stage...you are right.

Thanks,

Ovidiu
ovidiu
Post Reply