Page 1 of 1

how i can distribute data equally using partition Technic

Posted: Wed Nov 02, 2011 11:52 pm
by Arun Reddy
Hi

Any one can help me out ...

i had a data like this deptno-10,10,20,20,30,30,40,40,50,50,60,60
here i want to use partition Technic to distribute data equally suppose i am using 4 node configuration here my stage is aggregation.

Hash Partion

Posted: Thu Nov 03, 2011 12:43 am
by jpraveen
Arun,

you can go for Key based partition like HASH, and specify the Key and also you can use sort stage before Aggregation stage.

Posted: Thu Nov 03, 2011 2:13 am
by BI-RMA
Your request is ambivalent:

From the subject one might guess that it is your main aim to distribute the rows in equal numbers to all nodes. With a non-unique key like your deptno this would be best achieved by a non-keybased partitioning method like round-robin, because this guarantees that all rows get equal shares of the input-data.

If you want to perform aggregations using deptno as group, however, you need to have all values of the same group (deptno) in the same partition. So You absolutely need a key-based partitioning method to get a correct result. This may lead to unequal distribution of your input data (skewing). In your example (four nodes - six values appearing twice each) you will get - at best - two nodes with 2 rows each and two nodes with 4 rows each. Mind you that your example has exal numbers of members for each deptno. With unequal numbers - say one very large and some smaller - skewing may become really significant.

In this case You might consider using combined keys for aggregation, if applicable.

As for partitioning: Auto will recognize the partitioning requirements for Aggregator, so you should be allright with that. Switching manually to round-robin will leave you with wrong aggregation-results and duplicate keys in your output. So choosing a partitioning-method manually carries a certain risk.

To check for skewing, set $APT_RECORD_COUNTS to true. You can see the distribution of your records over partitions in the log in director then.

DataStage 8.7 will have the ability to override incorrect partitioning-methods unsuitable to a defined task (producing warnings in the job-logs when this happens). I am quite curious what results this will produce when 8.7 gets distributed to some more sites.

Re: Hash Partion

Posted: Fri Nov 04, 2011 9:32 pm
by Arun Reddy
jpraveen wrote:you can go for Key based partition like HASH, and specify the Key and also you can use sort stage before Aggregation stage.
Thanks for ur reply Praveen

But all same grouped keys will go to one output like that but remaining 50,60 deptno data where it wil go..?

Re: Hash Partion

Posted: Sat Nov 05, 2011 2:59 am
by BI-RMA
As I already stated: each key value will result in a hash-value that determines in which node the data is processed. If you consider a distribution by modulo your values would all be processed by nodes 0 and 2 in a four node configuration. Using hash it depends on the hashing-algorythm used, which is probably pretty much the same with a single integer key-column.

Re: Hash Partion

Posted: Sat Nov 05, 2011 8:58 am
by Arun Reddy
Thank u ronald.. for answering the question

In this case data is not equally distributed with hash Technic..

Posted: Sat Nov 05, 2011 9:44 am
by jwiles
Arun,

The goal of partitioning is this: Distribute the data as evenly as possible while at the same time meeting the requirements of the processing logic.

Because you must keep like-keyed records together in the same partition, you have no guarantee that you can evenly distribute your records--it is entirely dependent upon the distribution of key values and the number of partitions your job is using. The only exception is running your job with only one partition.

Hash partitioning uses an algorithm which determines what partition a record goes to based upon the number of partitions and the physical value of the data in your partition columns. The guarantee is that all records will go to a partition, and all records with identical key values will go to the same partition...NOT that they will be evenly distributed.

Regards,