surrogate key stage doubt
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 60
- Joined: Sun Nov 21, 2004 2:24 am
surrogate key stage doubt
in documentation it is written that incoming data should be non partitioned. So if I have stage before surroget key stage(Active stage which means data will be partitioned/repartitioned) then I have to run this stage in sequential mode so that data will be non- partitioned before coming to surrdate key stage. Is my understanding is correct.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Whose documentation? Where? That's just crazy! The whole point of using partitioned data is to make use of partition parallelism. I think that you will find that any such assertion is subject to some kind of conditional clause.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
LK, Go to the Parallel Job Developers Guide, go to the Surrogate Key Stage chapter, look at the first section titled "Key Space" and look at the example diagram showing how keys are allocated across four nodes. It's also a good example of what happens when the partitions are not balanced, Node C has two rows and Node B has four leading to skipped numbers in the key sequence. To guarantee no holes in the sequence the partitions need to be perfectly balanced which you can get from round robin partitioning.
This section goes on to talk about what happens when you round robin partition already partitioned data. This leads to some crazy repartitioning and key holes. You should only round robin partition sequential data, not partitioned data.
This section goes on to talk about what happens when you round robin partition already partitioned data. This leads to some crazy repartitioning and key holes. You should only round robin partition sequential data, not partitioned data.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
It doesn't appear that you are saying that there is anything wrong with key holes. To me as long as the surrogate key is unique, I don't care if there are gaps in my surrogate keys. You surrogate is just used for relationships, it should not be something that you or your end user rely on. So if you have gaps in your keys, who really cares?
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com