Aggregator Stage Warning.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

kollurianu
Premium Member
Premium Member
Posts: 614
Joined: Fri Feb 06, 2004 3:59 pm

Post by kollurianu »

Thanks Pravin for your response.
The group field and sort key are the same and that is only one field.So not sure reason for warning.

Any inputs greatly appreciated.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

The issue causing the warning is not the number of key columns, it is the number of distinct values contained in those key columns. When using the Hash method, each distinct value creates an entry in the internal hash table. 16384 distinct values = 16384 entries in the hash table.

How many distinct values are contained in your key columns??? Remember, distinct values != number of rows.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
kollurianu
Premium Member
Premium Member
Posts: 614
Joined: Fri Feb 06, 2004 3:59 pm

Post by kollurianu »

Now the Warnings are different than above..Please see the below warning after sorting the input and and the same sort key is used in the group field in the aggregator with Method = Sort.
Any inputs greatly appreciated.

Aggregator_33: When checking operator: User inserted sort "Sort_22" does not fulfill the sort requirements of the downstream operator "APT_SortedGroup2Operator in Aggregator_33"
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Search. I found 53 matches when I did an exact search for "does not fulfill the sort requirements of the downstream operator".
-craig

"You can never have too many knives" -- Logan Nine Fingers
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

jwiles wrote:. For a large number of groups, pre-sorting the data and using the Sort method in Aggregator may likely be more efficient.
What will be the impact if we use Sort method in Aggregator for a small number of groups(where hash method is more preferable)?.

Thanks
pandeeswaran
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

pandeesh wrote:What will be the impact if we use Sort method in Aggregator for a small number of groups(where hash method is more preferable)?.

Thanks
you will have an additional tsort operator applied over your data which could have been avoided by choosing hash meathod.
- Zulfi
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

finally, i have decided to go with sort method in aggregator stage.
Daily we are getting warnings that "hash table has grown upto 16384 entries". When this warnings cross 50(the max warnigs allowed), the job gets aborted (very rarely).

But as a permanent fix, i would like to go with sort method.
But my question is currently in agregator stage, hash partitioning is used and there are 4 fields as key,

Now i am going to place a sort stage before the aggregator stage and sort in ascending .
After that i ll change the method to sort in aggregator stage.
besides that should i need to change the current partitioning method(hash) or shall i leave as it is?

From the posts, it's mentioned that data should be sorted in all the grouping key fields and partioned at least in one key field.

Wil there be any impact if i leave the current partitioning method(hash) as it is?

thanks
pandeeswaran
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

pandeesh wrote:Now i am going to place a sort stage before the aggregator stage and sort in ascending .
After that i ll change the method to sort in aggregator stage.
besides that should i need to change the current partitioning method(hash) or shall i leave as it is?
you dont have to put a sort stage instead perform an inline sort and functionally you can go ahead with the partiotion as currently exists
- Zulfi
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

inline sort implies?

The input records to aggregator stage are from join stage?

Where i need to perform inline sort?

Thanks
pandeeswaran
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Inline sort and the sort stage are effectively the same thing and for most uses it makes no difference which you choose. Both add an Orchestrate tsort operator to the job, only one shows it as an additional stage in the Designer GUI (i.e. Sort stage). Using the Sort stage instead of an inline sort allows you some additional flexibility in sorting (such as the "Don't Sort Previously Sorted" option) and also to perform some additional sort tuning should the need arise. (sort tuning has been discussed elsewhere in the forum so search for it if you are interested).

Inline sort is selected on the Input tab of a stage.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply