Comparing columns

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sylvan_rydes
Participant
Posts: 47
Joined: Wed Apr 12, 2006 12:13 pm

Comparing columns

Post by sylvan_rydes »

Hi All,

This is something wierd. I need to compare two columns having character strings. The problems is, If they match 75% or more then change them to same as first column. If they match less than 75% then keep them as they are. I was wondering how will this be possible in Datastage. Any help will be appreciated.

Thanks in advance.

Sylvan
sylvan rydes
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Define "match by 75%". Sounds like the kind of probabilistic match that QualityStage (perhaps invoked through a QualityStage plug-in stage in your DataStage job) is ideally suited to do.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sylvan_rydes
Participant
Posts: 47
Joined: Wed Apr 12, 2006 12:13 pm

Post by sylvan_rydes »

Hi Ray,

Thanks for your reply. Well...I know that it is possible with quality stage. Even I have a PL/SQL function ready to do that. But just want to know if it could be done using Datastage in a simple way(using some built in function).

Now for example we have Examination type and Exam type. Both are same so must be replaced with the first one.

Thanks again.

Sylvan
sylvan rydes
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

You can probably write a basic routine. Do a character by character comparison. If they match store it in one variable and if it does not, store it in another. Get the size of both these variables and mainpulate the percentage to get the percentage of match and return the appropriate value.

Code: Select all

      FUNCTION PercentileMatch(Arg1,Arg2)

      in.Col1=trim(Arg1)
      in.Col2=trim(Arg2)

      Matched = ''

      If Len(in.Col1) >= Len(in.Col2)
      Then
         MaxLen=Len(in.Col1)
      End
      Else
         MaxLen=Len(in.Col2)
      End

      For i=1 To MaxLen
         If in.Col1[i,1] = in.Col2[i,1]
         Then
            Matched:=in.Col1[i,1]
         End
      Next i

      Matched.Percentage=(Len(Matched)/MaxLen)*100
      If Matched.Percentage >= 75
      Then
         Ans=@True
      End
      Else
         Ans=@False
      End
Reference this routine in a stage variable. Pass Col1 as Arg1 and Col2 as Arg2. And in the derivation of the second column check if the stage variable is @True put Col1 else Col2. That routine was off of my head. It needs testing. You can also add code to handle null values and empty columns.
Last edited by DSguru2B on Thu Dec 28, 2006 8:48 am, edited 3 times in total.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
sylvan_rydes
Participant
Posts: 47
Joined: Wed Apr 12, 2006 12:13 pm

Post by sylvan_rydes »

Hi DSguru2B,

Thanks for the detailed explanation. Now I will create the routine and then hopefully I will get whaqt I am looking for.

Thanks again.

Sylvan
sylvan rydes
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I edited my previous post and wrote a routine. See if that works. Was just getting bored so sat down to write it out.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply