sort order

tostay2003 · Post by **tostay2003** » Mon Jun 19, 2006 12:32 pm

Hi,

A small question.

If I have got two fields, Field1 takes upto 10000 values and Field2 takes upto 4 values.

which would be the better way (performance) to specify the properties in the sort stage.

a) Field1 ASC, Field2 ASC
b) Field2 ASC, Field1 ASC
c) Doesnt matter

My Hinch is (a)

Thank You

chulett · Post by **chulett** » Mon Jun 19, 2006 12:37 pm

Neither one is 'better', the answer depends on what order you need them sorted in downstream from there.

tostay2003 · Post by **tostay2003** » Mon Jun 19, 2006 12:41 pm

I need to check duplicates based on Field1 and Field2

DSguru2B · Post by **DSguru2B** » Mon Jun 19, 2006 12:58 pm

As Craig said, it really depends how you want them grouped. Does Field2 come under Field1 or viceversa. The sort order will be specific to your requirement.

Ultramundane · Post by **Ultramundane** » Mon Jun 19, 2006 2:46 pm

I bumped up Field2 and ran this test. Field 1 with a cycle of 10000 values for every distinct value in Field2 where Field2 had cycle of 40 values. Both columns defined as int not null.

Field1, Field2 Average 10 Runs: 65 seconds
Field2, Field1 Average 10 Runs: 88 seconds

I saw an improvement in speed by about 26% by sorting by Field1, Field2.

DSguru2B · Post by **DSguru2B** » Mon Jun 19, 2006 2:50 pm

Again. If Field1 needs is the Main group and Field2 is the sub-group then the sorting should be in that order, regardless. Switching the sort order will mess up the data. If performance does become an issue, resort to the unix sort.

Ultramundane · Post by **Ultramundane** » Mon Jun 19, 2006 2:56 pm

I agree that you should sort how you need that data to be sorted.

It sounded like in this case the author just wanted to sort to find duplicate values. He does not care the actual order. Just sorting to find duplicates on some key values. In this case, I would sort by whatever is the most efficient.

He gave us some values and I tested them.

I found that sorting by field1 then field2 was about 26% than sorting by field2 then field1.

kumar_s · Post by **kumar_s** » Tue Jun 20, 2006 1:51 am

How many time you repeated the test to confirm the effeciency?
Is your server consistantly loaded for each test for all the times?
What is the volume of data you took for this test?

ArndW · Post by **ArndW** » Tue Jun 20, 2006 3:19 am

These speeds also depend greatly on what order the incoming data is already in.

Ultramundane · Post by **Ultramundane** » Tue Jun 20, 2006 7:37 am

How many time you repeated the test to confirm the effeciency?
Field1, Field2 Average 10 Runs: 65 seconds
Field2, Field1 Average 10 Runs: 88 seconds

Is your server consistantly loaded for each test for all the times?
I was the only user on this system and this was only the job running.

What is the volume of data you took for this test?
I bumped up Field2 and ran this test. Field 1 with a cycle of 10000 values for every distinct value in Field2 where Field2 had cycle of 40 values. Both columns defined as int not null. Thus, 10000 * 40 = 400000 records.

These speeds also depend greatly on what order the incoming data is already in. I tried randomizing and sorting descending as well. Got the same results.

DSXchange

sort order

sort order

Re: sort order