How the Entire Partioning works
Moderators: chulett, rschirm, roy
How the Entire Partioning works
Dear Experts
In parallel jobs how does the Entire Partitioning property works, suppose if the data gets divided in to 5 partitions all the 5 partitions will have the same data as the original dataset .
As the original single dataset is divided into 5 datasets is not a overhead itself, instead of copying the same data in to 5 different datasets which are same, is it not better to keep the original dataset itself. How exactly does it works ?
I have seen few posts in this site ,but non of them clearly explains and I am not convinced myself .
Thanks
In parallel jobs how does the Entire Partitioning property works, suppose if the data gets divided in to 5 partitions all the 5 partitions will have the same data as the original dataset .
As the original single dataset is divided into 5 datasets is not a overhead itself, instead of copying the same data in to 5 different datasets which are same, is it not better to keep the original dataset itself. How exactly does it works ?
I have seen few posts in this site ,but non of them clearly explains and I am not convinced myself .
Thanks
-
- Premium Member
- Posts: 892
- Joined: Thu Oct 16, 2003 5:18 am
-
- Premium Member
- Posts: 892
- Joined: Thu Oct 16, 2003 5:18 am
I will try to be a bit more verbose and see if that helps.
Entire is typically used for the reference link to a lookup stage, and typically only non-sparse lookups with a small to medium sized number of values in the lookup set (hundreds or thousands).
When you select "Entire", a full copy of all of the data values is made available for each partition in the lookup stage. This means the incoming data does not need to be partitioned in any particular manner, as all partitions have access to all lookup values.
The other option is to insure both the reference link and the input link are both partitioned identically to insure that partitions have access to the subset of reference values that could possibly match their subset of incoming data values.
Side note: At release 8.0, if you are on an SMP (single box) configuration you can reduce memory usage by selecting "Auto". By default it will select "Entire" and use a single copy stored in shared memory that can be accessed by all partitions.
Entire is typically used for the reference link to a lookup stage, and typically only non-sparse lookups with a small to medium sized number of values in the lookup set (hundreds or thousands).
When you select "Entire", a full copy of all of the data values is made available for each partition in the lookup stage. This means the incoming data does not need to be partitioned in any particular manner, as all partitions have access to all lookup values.
The other option is to insure both the reference link and the input link are both partitioned identically to insure that partitions have access to the subset of reference values that could possibly match their subset of incoming data values.
Side note: At release 8.0, if you are on an SMP (single box) configuration you can reduce memory usage by selecting "Auto". By default it will select "Entire" and use a single copy stored in shared memory that can be accessed by all partitions.
This happens in 7.x as well, from what I've seen.asorrell wrote:Side note: At release 8.0, if you are on an SMP (single box) configuration you can reduce memory usage by selecting "Auto". By default it will select "Entire" and use a single copy stored in shared memory that can be accessed by all partitions.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: