Problem with SORT
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 39
- Joined: Mon Feb 13, 2012 1:30 pm
Problem with SORT
Hi,
I am doing the sort on two key columns, but seems like it is doing the sort on the combination of these columns. What i want is - it should sort on first column and in case duplicate records for first column then sort on second column.
Can somebody help please?
Thanks,
Vishal
I am doing the sort on two key columns, but seems like it is doing the sort on the combination of these columns. What i want is - it should sort on first column and in case duplicate records for first column then sort on second column.
Can somebody help please?
Thanks,
Vishal
That's exactly what sorting on two columns does...
or are you trying to say
are both being sorted as 120?
or are you trying to say
Code: Select all
1 20
12 0
Yah, was going to chime in to say that both of your scenarios are equivalent in my mind. Whether it "sorts on a combination" of the two columns or "sorts on the first then the second" the end result should be the same. Now, string sorts can confuse people if that's what you are seeing or doing.
Can you post some examples with real data and let us know the data types involved? For example, how they come in, how they are being sorted and how you'd like them sorted. That would help us help you.
Can you post some examples with real data and let us know the data types involved? For example, how they come in, how they are being sorted and how you'd like them sorted. That would help us help you.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 39
- Joined: Mon Feb 13, 2012 1:30 pm
some example
My data is something like ---
Field1 field2
18834 *
18834 0
18832 *
18832 0
the result is coming as -
18832 *
18834 *
18832 0
18834 0
but i want the result as -
18832 *
18832 0
18834 *
18834 0
the first field is 'Double' and the second one is varchar.
Field1 field2
18834 *
18834 0
18832 *
18832 0
the result is coming as -
18832 *
18834 *
18832 0
18834 0
but i want the result as -
18832 *
18832 0
18834 *
18834 0
the first field is 'Double' and the second one is varchar.
Craig asks
Are you properly partitioning your data?
In which order are you specifying your sort keys in the sort stage?
Is the output from a Peek or a sequential file?
Regards,
Vishu repliesCan you post some examples with real data
My data is something like ---
Are you properly partitioning your data?
In which order are you specifying your sort keys in the sort stage?
Is the output from a Peek or a sequential file?
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
-
- Participant
- Posts: 39
- Joined: Mon Feb 13, 2012 1:30 pm
Here are some more details -
1. Input is a Data set
2. Then a filter to get only these two fields.
3. Did not change anything in partitioning (Auto)
4. the order in sort stage is 1. field1 2. field2
5. output is going to a data set via a Copy state.
6. I tried to change the input partition to use Hash partition for these two keys in sort stage but no success.
1. Input is a Data set
2. Then a filter to get only these two fields.
3. Did not change anything in partitioning (Auto)
4. the order in sort stage is 1. field1 2. field2
5. output is going to a data set via a Copy state.
6. I tried to change the input partition to use Hash partition for these two keys in sort stage but no success.
-
- Participant
- Posts: 39
- Joined: Mon Feb 13, 2012 1:30 pm
Try run in sequential mode if the entire output needs to be sorted.vishu19aug wrote:Here are some more details -
5. output is going to a data set via a Copy state.
6. I tried to change the input partition to use Hash partition for these two keys in sort stage but no success.
Not sure about viewing order in a dataset but when viewing the dataset, read from only one partition as you may be getting the writes from multiple partitions playing with your display (of course, assuming you have been running on multiple nodes)
-
- Participant
- Posts: 39
- Joined: Mon Feb 13, 2012 1:30 pm
What data type is field 2?
If the data type is varchar() then most collating sequences will sort "*" (ASCII Character 42) after the digits (ASCII Characters 48-57).
If the "*" denotes a NULL value in that field, datastage defaults to sorting that before other values.
Could you be inadvertantly sorting on "Field2" then "Field1"?
If the data type is varchar() then most collating sequences will sort "*" (ASCII Character 42) after the digits (ASCII Characters 48-57).
If the "*" denotes a NULL value in that field, datastage defaults to sorting that before other values.
Could you be inadvertantly sorting on "Field2" then "Field1"?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
In that case it would seem that you are sorting by Field2 then Field1. Do you have other sample data which can help show the problem?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Reread my question...that is neither what I asked nor implied.There is no problem with viewing the data
Have you run the job with a single-node configuration file rather than the typical default 2-node configuration to see if there are any differences in the results? If so, that usually points to partitioning issues, but could also be the results of a collection operation.
You are writing to a parallel dataset using the Dataset stage, correct? Is the Dataset stage set to run in sequential or parallel mode?
The evidence you've provided so far points to a couple of possibilities:
1) Sort key order is backwards from what you claim
2) Your job is running in parallel with a target file/dataset that is sequential/single node. The collector will by default run in round-robin mode and could cause the issue you see.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.