surrogate key stage doubt

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
legendkiller
Participant
Posts: 60
Joined: Sun Nov 21, 2004 2:24 am

surrogate key stage doubt

Post by legendkiller »

in documentation it is written that incoming data should be non partitioned. So if I have stage before surroget key stage(Active stage which means data will be partitioned/repartitioned) then I have to run this stage in sequential mode so that data will be non- partitioned before coming to surrdate key stage. Is my understanding is correct.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Whose documentation? Where? That's just crazy! The whole point of using partitioned data is to make use of partition parallelism. I think that you will find that any such assertion is subject to some kind of conditional clause.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

LK, Go to the Parallel Job Developers Guide, go to the Surrogate Key Stage chapter, look at the first section titled "Key Space" and look at the example diagram showing how keys are allocated across four nodes. It's also a good example of what happens when the partitions are not balanced, Node C has two rows and Node B has four leading to skipped numbers in the key sequence. To guarantee no holes in the sequence the partitions need to be perfectly balanced which you can get from round robin partitioning.

This section goes on to talk about what happens when you round robin partition already partitioned data. This leads to some crazy repartitioning and key holes. You should only round robin partition sequential data, not partitioned data.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

It doesn't appear that you are saying that there is anything wrong with key holes. To me as long as the surrogate key is unique, I don't care if there are gaps in my surrogate keys. You surrogate is just used for relationships, it should not be something that you or your end user rely on. So if you have gaps in your keys, who really cares?
Post Reply