Page 1 of 1

Comparing columns

Posted: Wed Dec 27, 2006 9:11 pm
by sylvan_rydes
Hi All,

This is something wierd. I need to compare two columns having character strings. The problems is, If they match 75% or more then change them to same as first column. If they match less than 75% then keep them as they are. I was wondering how will this be possible in Datastage. Any help will be appreciated.

Thanks in advance.

Sylvan

Posted: Wed Dec 27, 2006 9:18 pm
by ray.wurlod
Define "match by 75%". Sounds like the kind of probabilistic match that QualityStage (perhaps invoked through a QualityStage plug-in stage in your DataStage job) is ideally suited to do.

Posted: Wed Dec 27, 2006 9:22 pm
by sylvan_rydes
Hi Ray,

Thanks for your reply. Well...I know that it is possible with quality stage. Even I have a PL/SQL function ready to do that. But just want to know if it could be done using Datastage in a simple way(using some built in function).

Now for example we have Examination type and Exam type. Both are same so must be replaced with the first one.

Thanks again.

Sylvan

Posted: Wed Dec 27, 2006 9:27 pm
by DSguru2B
You can probably write a basic routine. Do a character by character comparison. If they match store it in one variable and if it does not, store it in another. Get the size of both these variables and mainpulate the percentage to get the percentage of match and return the appropriate value.

Code: Select all

      FUNCTION PercentileMatch(Arg1,Arg2)

      in.Col1=trim(Arg1)
      in.Col2=trim(Arg2)

      Matched = ''

      If Len(in.Col1) >= Len(in.Col2)
      Then
         MaxLen=Len(in.Col1)
      End
      Else
         MaxLen=Len(in.Col2)
      End

      For i=1 To MaxLen
         If in.Col1[i,1] = in.Col2[i,1]
         Then
            Matched:=in.Col1[i,1]
         End
      Next i

      Matched.Percentage=(Len(Matched)/MaxLen)*100
      If Matched.Percentage >= 75
      Then
         Ans=@True
      End
      Else
         Ans=@False
      End
Reference this routine in a stage variable. Pass Col1 as Arg1 and Col2 as Arg2. And in the derivation of the second column check if the stage variable is @True put Col1 else Col2. That routine was off of my head. It needs testing. You can also add code to handle null values and empty columns.

Posted: Wed Dec 27, 2006 9:47 pm
by sylvan_rydes
Hi DSguru2B,

Thanks for the detailed explanation. Now I will create the routine and then hopefully I will get whaqt I am looking for.

Thanks again.

Sylvan

Posted: Wed Dec 27, 2006 9:58 pm
by DSguru2B
I edited my previous post and wrote a routine. See if that works. Was just getting bored so sat down to write it out.