Page 1 of 1

Sorting Issues...

Posted: Fri Aug 22, 2003 11:42 pm
by rasi
Hi,

My job is creates 12 million records which need to be agregate and then insert into the table. I am doing a pre-sort and using the sort order for the sorted columns. But when I use for 12 million records it abort saying row out of sequence. Whereas if I use it for few hundred thousand I is working fine.

Can anyone help.
Rasi

Posted: Sat Aug 23, 2003 7:58 am
by kcbland
Your sorted data does not match the information you put into the aggregator. Either you setup the aggregator stage incorrectly, or your sorting is not correct.

Kenneth Bland

Posted: Sat Aug 23, 2003 5:10 pm
by ray.wurlod
I second Ken's diagnosis. Can you post your sorting criteria and the contents of the Inputs grid of the Aggregator stage?

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

Posted: Sun Aug 24, 2003 11:35 pm
by rasi
Hi Ray,

I have totally 11 columns in the input grid and 11 columns in the output grid. Out of 11 column 10 column is pre-sorted before using aggregator stage. All the 10 sorted column is enabled with the Group button and the left out one column does the sum for the group.

I had checked the input order and the output order both are same. And one more thing which made this job to run is when I removed the sort order from the Aggregator stage it worked. But still it should work with the sort order enabled.

Cheers
Rasi

Posted: Mon Aug 25, 2003 12:41 am
by degraciavg
quote:Originally posted by rasi

Out of 11 column 10 column is pre-sorted before using aggregator stage. All the 10 sorted column is enabled with the Group button and the left out one column does the sum for the group.



How was the pre-sorting done? Was the data sorted from source or by Sort stage?

The key is to make sure the column sequence in the Order By clause or Sort Stage is the same as the Aggregator stage. Also make sure that the Sort order (whether ascending or descending) of each field is the same.

If you have done a thorough check on the program and the problem still persists, you will have to check your resources esp swap disk space.

If you have limited resources, it might be wiser to partition your data and do the aggregation for each partition. What is your DS verion?

regards,
vladimir

Posted: Mon Aug 25, 2003 8:39 am
by inter5566
Rasi,

Vladimir pretty well covered the answer. But put in shorter terms, the sort column in the aggregator stage is only for indicating that the incoming data is already sorted.

Steve

Posted: Mon Aug 25, 2003 9:30 pm
by rasi
Hi Vladimir,

Pre-sorting was done in unix. Datastage version is 6. As I mentioned the column sequence and the order is proper in the input and output Aggregator stage. I too had checked the resource it is fine. And the thing is that if I remove the sort order and run it is running fine and I am getting the result. So this should take more resource compared with sort order.

Cheers
Rasi

Posted: Mon Aug 25, 2003 9:58 pm
by degraciavg
Hi Rasi,

The Aggregator stage performs better when the input data is sorted. It doesn't consume more resource than when input data is not sorted. If you don't get the error when you remove it, then you don't have a resource problem. The "out of sequence" error in this case means that your input data is definitely not sorted.

Do you do any lookup before you aggregate? Is your lookup data sorted?

You may try this experiment...
1. create another job that will sort your input data and stage it into a new sequential (use Sort stage)
2. and then use this sequential file as the input of your aggregator stage (use the same Sort Order)

Let us know the results...

Regards,
vladimir