Aggregator - Comparison with & without asserting Sort or

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
vnspn
Participant
Posts: 165
Joined: Mon Feb 12, 2007 11:42 am

Aggregator - Comparison with & without asserting Sort or

Post by vnspn »

Hi,

I did a sample test and found this, a bit strange result.

I did a comparison run using Aggregator 'without' and 'with' asserting that the incoming rows are sorted. I tested it for about 1 million incoming rows with data being in a sorted order.

I expected that the Job should perform better if we assert to the Aggregator that the data is sorted. But unlikely for me, running the Job with asserting the data to be sorted took a little more time than running it without asserting it as sorted.

Is it justifiable that why when asserting the sort order. it takes more time?

Thanks.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No, it should take significantly less time - if done properly. You need to sort in a manner that supports the aggregation being done. Otherwise, your 'sort' time is wasted as the stage cannot utilize the data in the order you've sent it.

You can tell you got an issue if you sort the data, assert that sorted order and the stage is still a 'bottleneck' - all rows flow in before any flow out. You haven't sorted the incoming data properly. Proper sorting will allow output rows to flow from the stage at every 'sort break'.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vnspn
Participant
Posts: 165
Joined: Mon Feb 12, 2007 11:42 am

Post by vnspn »

The incoming data is already sorted on the column that I want it to be aggregated. The source is from a flat file.

The column on which it is sorted contains a combination of alpha-numeric values. So it is not in an ascending or descending order that DataStage expects but it is groups based on a particular order.

So, I specified the option 'Ignore' as the Sort Order in the inputs tab of the Aggregator. It works as expected; output rows flow from the stage whenever there is a 'sort break'.

But still it takes more time than running without asserting that it is sorted.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

vnspn wrote:So, I specified the option 'Ignore' as the Sort Order in the inputs tab of the Aggregator.
Ignore is a dangerous option. Are you sure you truly have something more complex than can be declared as ascending or descending? Could it not be properly sorted so that it can be? Sure seems like it could be with a single column but you'd have to post examples of your data to confirm.

And are you sure you get identical output in both cases?
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

vnspn wrote:The incoming data is already sorted on the column that I want it to be aggregated. The source is from a flat file.
Plus, the sorting is to be done and specified on the grouping key and not on columns that are aggregated.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
vnspn
Participant
Posts: 165
Joined: Mon Feb 12, 2007 11:42 am

Post by vnspn »

Yes, I do get the same identical output in both cases.

Ok, here is a sample of the column values that is in a particular sorted order,

001ZQ
002ZQ
001ZR
001ZT
E4400
E4400

If I specify this as to be 'Ascending' in sort order, then I get an error as DataStage expects the 3rd row to be in the 2nd position. Hope I'm clear here. The rows are in a particular grouped order.

DSguru2B, sorry for making a mistake while typing. It is sorted on the grouping key only.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

When you assert that the data are sorted, the Aggreagator stage must include a check for "row out of order" - and abort if one is found.

It is probably this check that accounts for the difference.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply