Row out of sequence

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Billyqing
Participant
Posts: 44
Joined: Thu May 13, 2004 12:00 pm
Location: Canada

Row out of sequence

Post by Billyqing »

I have met this error in AGGREGATOR Stage twice in different cases:

First time I set the Sort Columns in Input Link and it ran OK. But Second

time ran another loading (a little bit small file than first time), the error occurred again. After removed the Sort setting, it ran OK.


The question is:

The first file (a big file) loading needs column being sorted and
The second file (a small file) Loading doesn't need column being sorted.

Why does this happen?
How to handle this problem?

My DataStage version is 5.2

Appreciate for any suggestions.
Bill
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Unsorted data requires the aggregator to work much harder, and there are limitations as to how much unsorted data it can aggregate. If you have 1 million rows that group to 1 million rows, the aggregator will have performance issues. If you have 1 million rows that group to 10 thousand rows, the aggregator can handle it.

Now, if you sorted the data first, then told the aggregator the sort order, it can rely on the data being pre-grouped and simply output its results as each grouping changes, as opposed to accumulating ALL rows before output, because it won't know when a group is finished.

So, always sort your data if you're going to have volume concerns and give the aggregator that assistance.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Indicating on the Input link to the Aggregator that the data are sorted does NOT sort the data! This has caught others in the past.

What's happening if you indicate that data are sorted is that you are asserting to the Aggregator stage that the data are indeed already sorted as indicated, allowing it to use a much more efficient algorithm.

You are not allowed to lie; either you were lucky on your first run or its data were sorted as indicated.

The efficient algorithm fails if the data are not sorted as indicated, so the Aggregator stage keeps a check on whether the expected sorted order is being adhered to, and aborts if it is not.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply