Page 1 of 1

surrogate key stage doubt

Posted: Mon Dec 19, 2005 5:17 am
by legendkiller
in documentation it is written that incoming data should be non partitioned. So if I have stage before surroget key stage(Active stage which means data will be partitioned/repartitioned) then I have to run this stage in sequential mode so that data will be non- partitioned before coming to surrdate key stage. Is my understanding is correct.

Posted: Mon Dec 19, 2005 1:52 pm
by ray.wurlod
Whose documentation? Where? That's just crazy! The whole point of using partitioned data is to make use of partition parallelism. I think that you will find that any such assertion is subject to some kind of conditional clause.

Posted: Mon Dec 19, 2005 4:29 pm
by vmcburney
LK, Go to the Parallel Job Developers Guide, go to the Surrogate Key Stage chapter, look at the first section titled "Key Space" and look at the example diagram showing how keys are allocated across four nodes. It's also a good example of what happens when the partitions are not balanced, Node C has two rows and Node B has four leading to skipped numbers in the key sequence. To guarantee no holes in the sequence the partitions need to be perfectly balanced which you can get from round robin partitioning.

This section goes on to talk about what happens when you round robin partition already partitioned data. This leads to some crazy repartitioning and key holes. You should only round robin partition sequential data, not partitioned data.

Posted: Mon Dec 19, 2005 10:17 pm
by kwwilliams
It doesn't appear that you are saying that there is anything wrong with key holes. To me as long as the surrogate key is unique, I don't care if there are gaps in my surrogate keys. You surrogate is just used for relationships, it should not be something that you or your end user rely on. So if you have gaps in your keys, who really cares?