Fuzzy Logic!

Last week I spent a bit of time working on a of Fuzzy Logic type problem. This was for a VB app, rather than anything to do with Excel, but I wanted to test the different methods quickly and in a flexibly way. Excel was obviously ideal for doing this.
I have complied about 5 methods into a workbook (very, very rough stuff mind), none of the code is my own work, it’s all just gathered from around the net, I thought it might be useful for other folks (and myself again at some point!) to have it all in one work book let me know if you tied it up. The algorithms are:
Levenshtein distance
SoundEx
SoundEx2
SoundEx – Daitch-Mokotoff
Metaphone
DoubleMetaphone

So that’s 6 ways then not 5 as I may have lead you to believe. The fastest by far is standard SoundEx (it should be, it much simpler than the rest!), the others have various strengths and weakness.

If you have any more, let me know!!!!
Fuzzy tests.xls

Related posts:

  1. More Fuzzy Logic!!
  2. Map Point from Excel
  3. Standard Deviation of Counts

Comments

  1. zach says:

    I’m not in a position to test this out yet, but I look forward to it. I can’t say I’ve much luck with the soundex type alogrithms in the past, though. I’ve spent lots of time trying to match lists where one name is “rosenstein,jonathan” and the other is “rossenstein, jon dr”. I’ve never seen much of a breakthrough in this area. It’s pretty frustrating. You’d think with all the work put into things such as cataloging tools that there’d be something brilliant by now.

    By the way, congrats on being blocked by Websense. You’ve made the exclusive club!

  2. JP says:

    I’ve always found SoundEx hard to understand. This page has some interesting links:

    http://www.dailydoseofexcel.com/archives/2004/06/16/fuzzy-text-match/

    I also have a UDF somewhere (can’t locate it at the moment) that returns a percentage indicating how close two words are to each other.

    –JP

  3. JP says:

    OK here it is:

    Function Str_Comp(st1 As String, st2 As String) As Double
    '
    ' returns a number showing % comparison between two names
    '
    ' i.e. =Str_Comp(A1, B1)
    '
    ' Format cell as Percentage to make it look pretty!!
    '
    '
    Dim MtchTbl(100, 100)
    Dim MyMax As Double, ThisMax As Double
    Dim i As Integer, j As Integer, ii As Integer, jj As Integer

    With WorksheetFunction
        st1$ = Trim$(.Proper(st1$))
        st2$ = Trim$(.Proper(st2$))
    End With

    MyMax# = 0

    For i% = Len(st1$) To 1 Step -1
        For j% = Len(st2$) To 1 Step -1
            If Mid$(st1$, i%, 1) = Mid$(st2$, j%, 1) Then
                ThisMax# = 0
                For ii% = (i% + 1) To Len(st1$)
                    For jj% = (j% + 1) To Len(st2$)
                        If MtchTbl(ii%, jj%)> ThisMax# Then
                            ThisMax# = MtchTbl(ii%, jj%)
                        End If
                    Next jj%
                Next ii%
                MtchTbl(i%, j%) = ThisMax# + 1
                If (ThisMax# + 1)> ThisMax# Then
                    MyMax# = ThisMax# + 1
                End If
            End If
        Next j%
    Next i%
    Str_Comp = MyMax# / ((Len(st1$) + Len(st2$)) / 2)

    End Function

    And here's another link to fuzzy lookup code:

    http://hairyears.livejournal.com/115867.html

  4. Ross says:

    Thanks JP,
    Can see you've done a bit of work in this area too! I've found a few more methods flotting around, so i will update those in to that SS some, I think unless you're an expert with these things then it's all a bit trial and error!

  5. JP says:

    Just FTR, I didn't write that UDF. It's extremely useful, if you happen to need the functions.

    Thx,
    JP

Submit a Comment