write range map

just4u_sharath · Post by **just4u_sharath** » Wed Apr 30, 2008 3:41 pm

i have searched about this stage and after going through the explanations , still i feel i am not getting used to this stage. Actually what is range map? and where we should use this range partitioning? Cant we use range parttioning on a data without this range map stage? Is thsi stage used in real time?

ray.wurlod · Post by **ray.wurlod** » Wed Apr 30, 2008 4:06 pm

Range partitioning requires that the data be pre-processed so that the ranges can be determined. The algorithm attempts to distribute rows as equally as possible over the number of partitions, and to identify those values in the key column(s) that will yield that best distribution. Those values are written into a range map that can subsequently be used for range partitioning of those data.

You can not use range partitioning without a range map.

Because of the pre-processing requirement, range partitioning is not appropriate for "real time" processing.

just4u_sharath · Post by **just4u_sharath** » Wed Apr 30, 2008 5:01 pm

why do we need to use this stage( range partitioning). we can get the same functionality using the hash partitioning.
Also cant we force the job to put specified key values in one partition and the rest in other? When we say hash partioning related data will stay on same partition but what i want is specified key values must stay on one partition and other key values on other partition. Can we get this functionality.

ray.wurlod · Post by **ray.wurlod** » Wed Apr 30, 2008 5:10 pm

just4u_sharath · Post by **just4u_sharath** » Wed Apr 30, 2008 5:35 pm

ray.wurlod wrote:No.

what is the distinct feature of Range partioning when compared to hash partioning. Both offer same functionality and moreover range partitioning needs preprocessed data. What will be the best scenario where we use range partition over hash

ray.wurlod · Post by **ray.wurlod** » Wed Apr 30, 2008 5:41 pm

I don't know that there's an advantage one over the other. It's really driven by the business requirement - do you need to keep ranges of keys contiguous? If not, prefer hash or modulus for a key-based partitioning algorithm; there's no pre-processing required for either of these.

just4u_sharath · Post by **just4u_sharath** » Wed Apr 30, 2008 5:56 pm

ray.wurlod wrote:I don't know that there's an advantage one over the other. It's really driven by the business requirement - do you need to keep ranges of keys contiguous? If not, prefer hash or modulus for a key-based partitioning algorithm; there's no pre-processing required for either of these.

Thanks a lot.
I got that.

ray.wurlod · Post by **ray.wurlod** » Wed Apr 30, 2008 6:25 pm

Please mark the thread as Resolved.