Problem with SORT

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

vishu19aug
Participant
Posts: 39
Joined: Mon Feb 13, 2012 1:30 pm

Problem with SORT

Post by vishu19aug »

Hi,

I am doing the sort on two key columns, but seems like it is doing the sort on the combination of these columns. What i want is - it should sort on first column and in case duplicate records for first column then sort on second column.

Can somebody help please?

Thanks,
Vishal
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

That's exactly what sorting on two columns does...

or are you trying to say

Code: Select all

1   20
12  0
are both being sorted as 120?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yah, was going to chime in to say that both of your scenarios are equivalent in my mind. Whether it "sorts on a combination" of the two columns or "sorts on the first then the second" the end result should be the same. Now, string sorts can confuse people if that's what you are seeing or doing.

Can you post some examples with real data and let us know the data types involved? For example, how they come in, how they are being sorted and how you'd like them sorted. That would help us help you.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vishu19aug
Participant
Posts: 39
Joined: Mon Feb 13, 2012 1:30 pm

some example

Post by vishu19aug »

My data is something like ---

Field1 field2
18834 *
18834 0
18832 *
18832 0

the result is coming as -
18832 *
18834 *
18832 0
18834 0

but i want the result as -
18832 *
18832 0
18834 *
18834 0

the first field is 'Double' and the second one is varchar.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Craig asks
Can you post some examples with real data
Vishu replies
My data is something like ---
:?: :roll:

Are you properly partitioning your data?
In which order are you specifying your sort keys in the sort stage?
Is the output from a Peek or a sequential file?

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
vishu19aug
Participant
Posts: 39
Joined: Mon Feb 13, 2012 1:30 pm

Post by vishu19aug »

Here are some more details -
1. Input is a Data set
2. Then a filter to get only these two fields.
3. Did not change anything in partitioning (Auto)
4. the order in sort stage is 1. field1 2. field2
5. output is going to a data set via a Copy state.
6. I tried to change the input partition to use Hash partition for these two keys in sort stage but no success.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

jwiles wrote:Is the output from a Peek or a sequential file?

Regards,
Consider this!
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
vishu19aug
Participant
Posts: 39
Joined: Mon Feb 13, 2012 1:30 pm

Post by vishu19aug »

I am new to Datastage.. couldn't quite understand this..
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

vishu19aug wrote:Here are some more details -
5. output is going to a data set via a Copy state.
6. I tried to change the input partition to use Hash partition for these two keys in sort stage but no success.
Try run in sequential mode if the entire output needs to be sorted.

Not sure about viewing order in a dataset but when viewing the dataset, read from only one partition as you may be getting the writes from multiple partitions playing with your display (of course, assuming you have been running on multiple nodes)
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

How are you viewing the data? The View Data button or menu entry, or maybe orchadmin command or the Dataset Management Tool?

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
vishu19aug
Participant
Posts: 39
Joined: Mon Feb 13, 2012 1:30 pm

Post by vishu19aug »

There is no problem with viewing the data.. I can see the problem is with the records where field2 is '*'. Is there any special handling of * in datastage?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What data type is field 2?

If the data type is varchar() then most collating sequences will sort "*" (ASCII Character 42) after the digits (ASCII Characters 48-57).

If the "*" denotes a NULL value in that field, datastage defaults to sorting that before other values.

Could you be inadvertantly sorting on "Field2" then "Field1"?
vishu19aug
Participant
Posts: 39
Joined: Mon Feb 13, 2012 1:30 pm

Post by vishu19aug »

It is varchar()
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

In that case it would seem that you are sorting by Field2 then Field1. Do you have other sample data which can help show the problem?
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

There is no problem with viewing the data
Reread my question...that is neither what I asked nor implied.

Have you run the job with a single-node configuration file rather than the typical default 2-node configuration to see if there are any differences in the results? If so, that usually points to partitioning issues, but could also be the results of a collection operation.

You are writing to a parallel dataset using the Dataset stage, correct? Is the Dataset stage set to run in sequential or parallel mode?

The evidence you've provided so far points to a couple of possibilities:

1) Sort key order is backwards from what you claim
2) Your job is running in parallel with a target file/dataset that is sequential/single node. The collector will by default run in round-robin mode and could cause the issue you see.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply