Page 1 of 1

sort order

Posted: Mon Jun 19, 2006 12:32 pm
by tostay2003
Hi,

A small question.

If I have got two fields, Field1 takes upto 10000 values and Field2 takes upto 4 values.

which would be the better way (performance) to specify the properties in the sort stage.

a) Field1 ASC, Field2 ASC
b) Field2 ASC, Field1 ASC
c) Doesnt matter

My Hinch is (a)

Thank You

Posted: Mon Jun 19, 2006 12:37 pm
by chulett
Neither one is 'better', the answer depends on what order you need them sorted in downstream from there.

Posted: Mon Jun 19, 2006 12:41 pm
by tostay2003
I need to check duplicates based on Field1 and Field2

Posted: Mon Jun 19, 2006 12:58 pm
by DSguru2B
As Craig said, it really depends how you want them grouped. Does Field2 come under Field1 or viceversa. The sort order will be specific to your requirement.

Re: sort order

Posted: Mon Jun 19, 2006 2:46 pm
by Ultramundane
I bumped up Field2 and ran this test. Field 1 with a cycle of 10000 values for every distinct value in Field2 where Field2 had cycle of 40 values. Both columns defined as int not null.

Field1, Field2 Average 10 Runs: 65 seconds
Field2, Field1 Average 10 Runs: 88 seconds

I saw an improvement in speed by about 26% by sorting by Field1, Field2.

Posted: Mon Jun 19, 2006 2:50 pm
by DSguru2B
Again. If Field1 needs is the Main group and Field2 is the sub-group then the sorting should be in that order, regardless. Switching the sort order will mess up the data. If performance does become an issue, resort to the unix sort.

Posted: Mon Jun 19, 2006 2:56 pm
by Ultramundane
I agree that you should sort how you need that data to be sorted.

It sounded like in this case the author just wanted to sort to find duplicate values. He does not care the actual order. Just sorting to find duplicates on some key values. In this case, I would sort by whatever is the most efficient.

He gave us some values and I tested them.

I found that sorting by field1 then field2 was about 26% than sorting by field2 then field1.

Posted: Tue Jun 20, 2006 1:51 am
by kumar_s
How many time you repeated the test to confirm the effeciency?
Is your server consistantly loaded for each test for all the times?
What is the volume of data you took for this test?

Posted: Tue Jun 20, 2006 3:19 am
by ArndW
These speeds also depend greatly on what order the incoming data is already in.

Posted: Tue Jun 20, 2006 7:37 am
by Ultramundane
How many time you repeated the test to confirm the effeciency?
Field1, Field2 Average 10 Runs: 65 seconds
Field2, Field1 Average 10 Runs: 88 seconds

Is your server consistantly loaded for each test for all the times?
I was the only user on this system and this was only the job running.

What is the volume of data you took for this test?
I bumped up Field2 and ran this test. Field 1 with a cycle of 10000 values for every distinct value in Field2 where Field2 had cycle of 40 values. Both columns defined as int not null. Thus, 10000 * 40 = 400000 records.

These speeds also depend greatly on what order the incoming data is already in. I tried randomizing and sorting descending as well. Got the same results.