I have been working through many issues with my most recent lookup stage, and now that they are resolved, I have hit a new challenge.
Dimension A has two columns I care about (name, ID#) and several attributes (the rest of the columns). For historical reasons, I would like to keep the old attributes (complete with ID#) when those attributes change. So there will be duplicate Names with Different ID's when the other columns have changed. ID# is an Oracle Sequence, ensuring no duplication of the ID#.
I have a Lookup that populates the Dimension ID based on name (and two other Dimensions based on other columns- a total of 3 reference links).
Knowing that Dimension A can have the same name with multiple ID#s(which can be distinguished by either of two fields - ID# or Create Timestamp), I believe I can Sort and Remove Duplicates prior to the reference link (before it hits the lookup stage).
I have been doing some reading the Lookup Stage Constraints -
"Multiple Rows returned from link" with a Condition. Is there a more elegant way using just the lookup, or is the Ora->Sort->Remove Dups the better option? If sizing matters for this, the lookup will be less than 1000 rows for at least the next year and the job is designed to run daily with less than 15000 input rows.
Lookup Stage with Duplicate Dimension data
Moderators: chulett, rschirm, roy
Lookup Stage with Duplicate Dimension data
<<<>>>
My opinions are just that.
My opinions are just that.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Search for "surrogate key" and "type 2 slowly changing dimension". This is the conventionally used technology in this case. Only one of the dimension records is current at any one time for each particular business key value.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks! I worked through the full sql with a couple of DBAs and it works!DSguru2B wrote:Do an order by and max(timestamp) in the Ora and the result of that, will be your lookup. This way you will have only one row per name.
Great information. I read several forum topics on it here and a couple of articles through a Google search.ray.wurlod wrote:Search for "surrogate key" and "type 2 slowly changing dimension".
Thanks!
<<<>>>
My opinions are just that.
My opinions are just that.