Hi there,
This is in continuation of my previous doubts regarding TELCO ETL. We are parsing 200 gig of sequential file where the data load in our biggest table is to the tune of more thatn 200 mills every day.
Now,
the scenario is that we are reading sequential file and we have to
1. First, Do lookup of Dim table which has approx 1.5 mills records whose number will keep on increasing, to check if the value is new.
2. If the record is new, generate the key and load into the dim.
3. Next, store the key value for loading in 14 different fact and summary tables.
4. Look up table should be updated with this new value.
What might the best solution and why.
1. Use of Change Capture stage.
or
2. Do Hashfile lookup check.
or
3. Sparse Lookup
Your comments will be highly valuable.
Thanks,
Munish
Updating dimension table and new keys
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
If you are using surrogate keys you cannot do it all in a change data capture stage. The stage cannot find a match, compare fields and retrieve a surrogate key. You need to keep the lookup stage and follow it with the change capture stage for existing rows.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Thanks for your advices,
However, I am still not clear about how to
1. If I use Change Control, Will the latest values which are inserted into Dim table will be updated in Change control
2. If not what might be the best option like Hash Look up, Sparse Look up etc.
3. We are writing to DB and reading at same time, Will that effect efficiency.
Scenario:
Sequential File Read >> Check for New Dims >>
If Yes >> Create Key and Load to Dim Table
>> Update the values like Account Num with Account Key, Time with Time Key
>> Load the data into 15 Fact and Summary Tables
If No (If the value is already existing)
>> Update the values like Account Num with Account Key, Time with Time Key
>> Load the data into 15 Fact and Summary Tables
The biggest Dim table has 1.5 mills.
Thanks,
Munish
However, I am still not clear about how to
1. If I use Change Control, Will the latest values which are inserted into Dim table will be updated in Change control
2. If not what might be the best option like Hash Look up, Sparse Look up etc.
3. We are writing to DB and reading at same time, Will that effect efficiency.
Scenario:
Sequential File Read >> Check for New Dims >>
If Yes >> Create Key and Load to Dim Table
>> Update the values like Account Num with Account Key, Time with Time Key
>> Load the data into 15 Fact and Summary Tables
If No (If the value is already existing)
>> Update the values like Account Num with Account Key, Time with Time Key
>> Load the data into 15 Fact and Summary Tables
The biggest Dim table has 1.5 mills.
Thanks,
Munish
MK
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Think also about Part B of your scenario. A previously non-existent dimension key value appears more than once in your source data. How will you manage the second and subsequence appearances?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.