Lookup Stage with Duplicate Dimension data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
AaronVG
Participant
Posts: 13
Joined: Thu Feb 15, 2007 4:02 pm

Lookup Stage with Duplicate Dimension data

Post by AaronVG »

I have been working through many issues with my most recent lookup stage, and now that they are resolved, I have hit a new challenge.

Dimension A has two columns I care about (name, ID#) and several attributes (the rest of the columns). For historical reasons, I would like to keep the old attributes (complete with ID#) when those attributes change. So there will be duplicate Names with Different ID's when the other columns have changed. ID# is an Oracle Sequence, ensuring no duplication of the ID#.

I have a Lookup that populates the Dimension ID based on name (and two other Dimensions based on other columns- a total of 3 reference links).

Knowing that Dimension A can have the same name with multiple ID#s(which can be distinguished by either of two fields - ID# or Create Timestamp), I believe I can Sort and Remove Duplicates prior to the reference link (before it hits the lookup stage).

I have been doing some reading the Lookup Stage Constraints -
"Multiple Rows returned from link" with a Condition. Is there a more elegant way using just the lookup, or is the Ora->Sort->Remove Dups the better option? If sizing matters for this, the lookup will be less than 1000 rows for at least the next year and the job is designed to run daily with less than 15000 input rows.
<<<>>>
My opinions are just that.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Do an order by and max(timestamp) in the Ora and the result of that, will be your lookup. This way you will have only one row per name.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Search for "surrogate key" and "type 2 slowly changing dimension". This is the conventionally used technology in this case. Only one of the dimension records is current at any one time for each particular business key value.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
AaronVG
Participant
Posts: 13
Joined: Thu Feb 15, 2007 4:02 pm

Post by AaronVG »

DSguru2B wrote:Do an order by and max(timestamp) in the Ora and the result of that, will be your lookup. This way you will have only one row per name.
Thanks! I worked through the full sql with a couple of DBAs and it works!
ray.wurlod wrote:Search for "surrogate key" and "type 2 slowly changing dimension".
Great information. I read several forum topics on it here and a couple of articles through a Google search.

Thanks!
<<<>>>
My opinions are just that.
Post Reply