Updating dimension table and new keys

Munish · Post by **Munish** » Wed Nov 29, 2006 1:51 am

Hi there,
This is in continuation of my previous doubts regarding TELCO ETL. We are parsing 200 gig of sequential file where the data load in our biggest table is to the tune of more thatn 200 mills every day.

Now,
the scenario is that we are reading sequential file and we have to
1. First, Do lookup of Dim table which has approx 1.5 mills records whose number will keep on increasing, to check if the value is new.
2. If the record is new, generate the key and load into the dim.
3. Next, store the key value for loading in 14 different fact and summary tables.
4. Look up table should be updated with this new value.

What might the best solution and why.
1. Use of Change Capture stage.
or
2. Do Hashfile lookup check.
or
3. Sparse Lookup

Your comments will be highly valuable.

Thanks,
Munish

keshav0307 · Post by **keshav0307** » Wed Nov 29, 2006 5:43 am

i am using change capture stage for similar amount of data and for similar requirement.

vmcburney · Post by **vmcburney** » Wed Nov 29, 2006 7:08 am

If you are using surrogate keys you cannot do it all in a change data capture stage. The stage cannot find a match, compare fields and retrieve a surrogate key. You need to keep the lookup stage and follow it with the change capture stage for existing rows.

Munish · Post by **Munish** » Wed Nov 29, 2006 4:26 pm

Thanks for your advices,
However, I am still not clear about how to

1. If I use Change Control, Will the latest values which are inserted into Dim table will be updated in Change control
2. If not what might be the best option like Hash Look up, Sparse Look up etc.
3. We are writing to DB and reading at same time, Will that effect efficiency.

Scenario:

Sequential File Read >> Check for New Dims >>

If Yes >> Create Key and Load to Dim Table
>> Update the values like Account Num with Account Key, Time with Time Key
>> Load the data into 15 Fact and Summary Tables

If No (If the value is already existing)
>> Update the values like Account Num with Account Key, Time with Time Key
>> Load the data into 15 Fact and Summary Tables

The biggest Dim table has 1.5 mills.

Thanks,
Munish

ray.wurlod · Post by **ray.wurlod** » Wed Nov 29, 2006 5:20 pm

Think also about Part B of your scenario. A previously non-existent dimension key value appears more than once in your source data. How will you manage the second and subsequence appearances?

Munish · Post by **Munish** » Wed Nov 29, 2006 5:31 pm

You are right.
That is exactly the problem is.
I am just wonering how to refresh my lookup.
If I get that particular new value again, I do not want to insert it.

Any thoughts??

regards,
Munish