Page 2 of 2

Posted: Wed Jun 29, 2011 2:48 pm
by kollurianu
Thanks Pravin for your response.
The group field and sort key are the same and that is only one field.So not sure reason for warning.

Any inputs greatly appreciated.

Posted: Wed Jun 29, 2011 5:51 pm
by jwiles
The issue causing the warning is not the number of key columns, it is the number of distinct values contained in those key columns. When using the Hash method, each distinct value creates an entry in the internal hash table. 16384 distinct values = 16384 entries in the hash table.

How many distinct values are contained in your key columns??? Remember, distinct values != number of rows.

Regards,

Posted: Thu Jun 30, 2011 4:19 pm
by kollurianu
Now the Warnings are different than above..Please see the below warning after sorting the input and and the same sort key is used in the group field in the aggregator with Method = Sort.
Any inputs greatly appreciated.

Aggregator_33: When checking operator: User inserted sort "Sort_22" does not fulfill the sort requirements of the downstream operator "APT_SortedGroup2Operator in Aggregator_33"

Posted: Thu Jun 30, 2011 10:09 pm
by chulett
Search. I found 53 matches when I did an exact search for "does not fulfill the sort requirements of the downstream operator".

Posted: Sun Jul 10, 2011 4:56 am
by pandeesh
jwiles wrote:. For a large number of groups, pre-sorting the data and using the Sort method in Aggregator may likely be more efficient.
What will be the impact if we use Sort method in Aggregator for a small number of groups(where hash method is more preferable)?.

Thanks

Posted: Sun Jul 10, 2011 5:44 am
by zulfi123786
pandeesh wrote:What will be the impact if we use Sort method in Aggregator for a small number of groups(where hash method is more preferable)?.

Thanks
you will have an additional tsort operator applied over your data which could have been avoided by choosing hash meathod.

Posted: Mon Jul 11, 2011 12:27 am
by pandeesh
finally, i have decided to go with sort method in aggregator stage.
Daily we are getting warnings that "hash table has grown upto 16384 entries". When this warnings cross 50(the max warnigs allowed), the job gets aborted (very rarely).

But as a permanent fix, i would like to go with sort method.
But my question is currently in agregator stage, hash partitioning is used and there are 4 fields as key,

Now i am going to place a sort stage before the aggregator stage and sort in ascending .
After that i ll change the method to sort in aggregator stage.
besides that should i need to change the current partitioning method(hash) or shall i leave as it is?

From the posts, it's mentioned that data should be sorted in all the grouping key fields and partioned at least in one key field.

Wil there be any impact if i leave the current partitioning method(hash) as it is?

thanks

Posted: Mon Jul 11, 2011 5:22 am
by zulfi123786
pandeesh wrote:Now i am going to place a sort stage before the aggregator stage and sort in ascending .
After that i ll change the method to sort in aggregator stage.
besides that should i need to change the current partitioning method(hash) or shall i leave as it is?
you dont have to put a sort stage instead perform an inline sort and functionally you can go ahead with the partiotion as currently exists

Posted: Mon Jul 11, 2011 5:42 am
by pandeesh
inline sort implies?

The input records to aggregator stage are from join stage?

Where i need to perform inline sort?

Thanks

Posted: Mon Jul 11, 2011 8:07 am
by jwiles
Inline sort and the sort stage are effectively the same thing and for most uses it makes no difference which you choose. Both add an Orchestrate tsort operator to the job, only one shows it as an additional stage in the Designer GUI (i.e. Sort stage). Using the Sort stage instead of an inline sort allows you some additional flexibility in sorting (such as the "Don't Sort Previously Sorted" option) and also to perform some additional sort tuning should the need arise. (sort tuning has been discussed elsewhere in the forum so search for it if you are interested).

Inline sort is selected on the Input tab of a stage.

Regards,