Sort String

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

datastager
Participant
Posts: 7
Joined: Wed Oct 10, 2007 1:03 pm

Sort String

Post by datastager »

Hello,

I have a requirement where I need to do a lookup on a field. Before doing the lookup I need to sort the incomming data.

I need to sort it in the ascending order before doing the lookup (The hash file has the values in the sorted manner)

Texas,New York,California --> California,New York,Texas
Vermont,Alaska,New Mexico, New Jersey --> Alaska,New Jersey,New Mexico, Vermont.

Should I write a routine to do this or Can I do it on the transformer?
Please guide..
DeepakCorning
Premium Member
Premium Member
Posts: 503
Joined: Wed Jun 29, 2005 8:14 am

Re: Sort String

Post by DeepakCorning »

What is the source of your data? If the data is coming from a database you can put order by in the data source.
Thanks
Deepak Patil

Convince Them Confuse Them .. What's the difference?
datastager
Participant
Posts: 7
Joined: Wed Oct 10, 2007 1:03 pm

Post by datastager »

This field is from a pipe delimited file. The file has around 10 milion rows!
DeepakCorning
Premium Member
Premium Member
Posts: 503
Joined: Wed Jun 29, 2005 8:14 am

Post by DeepakCorning »

dii you try using the sort stage. It may not respond that well but will get the job done.
Thanks
Deepak Patil

Convince Them Confuse Them .. What's the difference?
datastager
Participant
Posts: 7
Joined: Wed Oct 10, 2007 1:03 pm

Post by datastager »

As you said, it might not respond. I want to try and avoid the sort stage.
It would be great if there are any functions to do it, or do I have to write one.... :(
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

Or In the sequential file use the filter option and sort your data beforehand then pass it to hashed file for lookup.
I haven't failed, I've found 10,000 ways that don't work.
Thomas Alva Edison(1847-1931)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: Sort String

Post by ray.wurlod »

datastager wrote:Hello,

I have a requirement where I need to do a lookup on a field. Before doing the lookup I need to sort the incomming data.

I need to sort it in the ascending order before doing the lookup (The hash file has the values in the sorted manner)
No it doesn't.

Storage in hashed files ("Hashed", you will notice, not "hash") is entirely governed by the hashed file's hashing algorithm.

The hashing algorithm determines the physical location of a record based upon its key value. You do not need to sort anything. A single I/O operation will retrieve any record from a hashed file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
datastager
Participant
Posts: 7
Joined: Wed Oct 10, 2007 1:03 pm

Post by datastager »

Sorry if i did not put forth my question clearly.

Actually I need to sort data inside a field, not the field itself.

Texas,New York,California Needs to be sorted to California,New York,Texas

and

Vermont,Alaska,New Mexico, New Jersey Needs to be sorted to Alaska,New Jersey,New Mexico, Vermont

and so on....
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You need a routine. It's not hard; an insert sort will do it.

Code: Select all

FUNCTION SortList(TheList, Delimiter)

* Change to undefine to perform unique sort (remove duplicates).
$DEFINE PreserveDuplicates

* Validate input arguments.
If UnAssigned(TheList) Or UnAssigned(TheDelimiter) Then RETURN(@NULL)
If IsNull(TheList) Or IsNull(TheDelimiter) Then RETURN(@NULL)

Ans = ""
Temp = Convert(Delimiter, @FM, (TheList))

Loop

   * Get next element from dynamic array.
   Remove Element From Temp Setting MoreElements

   Locate(Element, Ans; WhereFound; "AL")
$IFDEF PreserveDuplicates
   Then
      Ins Element Before(Ans<WhereFound>)
   End
$ENDIF
   Else
      Ins Element Before(Ans<WhereFound>)
   End

While MoreElements
Repeat

* Reinstate the original delimiter character.
Convert @FM To Delimiter In Ans

RETURN(Ans)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
datastager
Participant
Posts: 7
Joined: Wed Oct 10, 2007 1:03 pm

Post by datastager »

I tried to compile your code, but I get warnings. I am not yet an expert to debug your code.
Can you please check it when you get time.
Appreciate it!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I made it up sitting in an airport lounge. What specific warnings did you get? How did you try to create the routine? Irrespective, if does have some minor errors. Here's a working one.

Code: Select all

FUNCTION SortList(TheList, TheDelimiter)

* Change to undefine to perform unique sort (remove duplicates).
$DEFINE PreserveDuplicates

* Validate input arguments.
      If UnAssigned(TheList) Or UnAssigned(TheDelimiter) Then RETURN(@NULL)
      If IsNull(TheList) Or IsNull(TheDelimiter) Then RETURN(@NULL)

      Ans = ""
      Temp = Convert(TheDelimiter, @FM, (TheList))

      Loop

* Get next element from dynamic array.
         Remove Element From Temp Setting MoreElements

         Locate(Element, Ans; WhereFound; "AL")
         Then
$IFDEF PreserveDuplicates
            Ins Element Before Ans<WhereFound>
$ENDIF
         End
         Else
            Ins Element Before Ans<WhereFound>
         End

      While MoreElements
      Repeat

* Reinstate the original delimiter character.
      Convert @FM To TheDelimiter In Ans

RETURN(Ans)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
datastager
Participant
Posts: 7
Joined: Wed Oct 10, 2007 1:03 pm

Post by datastager »

Ray Rocks!
10 on 10

It works, I did not understand the code very much.
Not to be greedy, can you explain what's happening here.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I assume that the statements that idiot-proof the arguments are OK.

The main logic converts the delimited string to a dynamic array (which in one sense is just a delimited string with a different delimiter, but one for which there are some useful tools). The delimiter I used is a field mark, which may be represented by the system variable @FM.

The Remove statement takes the next element from a dynamic array. This is executed in a loop, processing (in your case) one city name at a time.

As each city name is found, it is compared with the result (which was initialized to an empty list). The result is asserted to be sorted in ascending left-justified order, by the "AL" argument in the Locate statement. The Locate statement returns the field number within the Ans dynamic array where it stopped searching - that is, the next field number past where the city name was found or ought to have been found.

The city name is then inserted into that location, so that the Ans dynamic array remains sorted. (That's why it's called an insertion sort.)

Finally, once all cities have been processed, the field marks in the Ans dynamic array are converted back to the original delimiter.

Consult the DataStage BASIC manual or on-line help for more information on each of the statements used.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

Hey Datastager,

Just out of curiosity: how long is it taking the code Ray provided for you to order the information for you? I know I have added this code to my collection of interesting bits and I'm just wondering how it compares to an external call using sed, awk or Perl.

As usual, Ray Rocks!

Bestest!
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

f(kN), where k is some constant and N is the number of items in the delimited list. That is, time taken is a linear function of number of items, for sufficiently small items (such as city names).

Remember that this function is sorting a delimited list of city names for each record processed - it is not sorting the file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply