Key generator for large volume

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Key generator for large volume

Post by Munish »

Hi There,
We have to create key during ETL.
This can be done by surrogate key generator.

However,
Fact table will be loaded with more than 100 millions of rows everyday.

Please advice (TELCO project) what might be the better solution and why:
1. Will the simple key like incremtal key is fine enough for such a large data load.
or
2. Should I go for some other Key generation mechanism ie Timestamp + (BillID + ItemID)

What is the better solution and why?
(Considering that all the load is in one batch.)

I look forward for your answers.

Thanks,
Munish
MK
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I would stick with a simple meaningless counter where the seed value is the current maximum value in the target table. I wouldn't get fancy concatenating intelligent fields to derive a dumb surrogate, it would just take longer. You can use the surrogate key stage or you can also generate a key in a transformer stage as shown in the FAQ forum.
Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Post by Munish »

Thanks for the reply.

It definitely makes sense.
However,
What are your thoughts regarding scenario
like such a key might be strange/ very big say after 5-10 years.

The doubt might be silly but I thought of expressing it.
MK
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Munish,

even though the key might be quite large in years to come it will still be more "compact" than using a derived key with some other mechanism.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

By the time you are ready for very large keys to be generated you will have version 8 or later, in which the Surrogate Key Generator stage is equipped to generate uint64 values.

That's 18446744073709551616 distinct key values. Enough for you?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Post by Munish »

Thanks for the reply.
Please advice:
1. How to shift this to Parallel Forum.
Cheers,
Munish
MK
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You have to be a Moderator. Roy is the moderator for this forum.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply