Updating dimension table and new keys

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Updating dimension table and new keys

Post by Munish »

Hi there,
This is in continuation of my previous doubts regarding TELCO ETL. We are parsing 200 gig of sequential file where the data load in our biggest table is to the tune of more thatn 200 mills every day.

Now,
the scenario is that we are reading sequential file and we have to
1. First, Do lookup of Dim table which has approx 1.5 mills records whose number will keep on increasing, to check if the value is new.
2. If the record is new, generate the key and load into the dim.
3. Next, store the key value for loading in 14 different fact and summary tables.
4. Look up table should be updated with this new value.

What might the best solution and why.
1. Use of Change Capture stage.
or
2. Do Hashfile lookup check.
or
3. Sparse Lookup

Your comments will be highly valuable.

Thanks,
Munish
MK
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

i am using change capture stage for similar amount of data and for similar requirement.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

If you are using surrogate keys you cannot do it all in a change data capture stage. The stage cannot find a match, compare fields and retrieve a surrogate key. You need to keep the lookup stage and follow it with the change capture stage for existing rows.
Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Post by Munish »

Thanks for your advices,
However, I am still not clear about how to

1. If I use Change Control, Will the latest values which are inserted into Dim table will be updated in Change control
2. If not what might be the best option like Hash Look up, Sparse Look up etc.
3. We are writing to DB and reading at same time, Will that effect efficiency.

Scenario:

Sequential File Read >> Check for New Dims >>

If Yes >> Create Key and Load to Dim Table
>> Update the values like Account Num with Account Key, Time with Time Key
>> Load the data into 15 Fact and Summary Tables

If No (If the value is already existing)
>> Update the values like Account Num with Account Key, Time with Time Key
>> Load the data into 15 Fact and Summary Tables

The biggest Dim table has 1.5 mills.

Thanks,
Munish
MK
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Think also about Part B of your scenario. A previously non-existent dimension key value appears more than once in your source data. How will you manage the second and subsequence appearances?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Post by Munish »

You are right.
That is exactly the problem is.
I am just wonering how to refresh my lookup.
If I get that particular new value again, I do not want to insert it.

Any thoughts??

regards,
Munish
MK
Post Reply