Last week I spent a bit of time working on a of Fuzzy Logic type problem. This was for a VB app, rather than anything to do with Excel, but I wanted to test the different methods quickly and in a flexibly way. Excel was obviously ideal for doing this.
I have complied about 5 methods into a workbook (very, very rough stuff mind), none of the code is my own work, it’s all just gathered from around the net, I thought it might be useful for other folks (and myself again at some point!) to have it all in one work book let me know if you tied it up. The algorithms are:
Levenshtein distance
SoundEx
SoundEx2
SoundEx – Daitch-Mokotoff
Metaphone
DoubleMetaphone
So that’s 6 ways then not 5 as I may have lead you to believe. The fastest by far is standard SoundEx (it should be, it much simpler than the rest!), the others have various strengths and weakness.
If you have any more, let me know!!!!
Fuzzy tests.xls
Related posts:
I’m not in a position to test this out yet, but I look forward to it. I can’t say I’ve much luck with the soundex type alogrithms in the past, though. I’ve spent lots of time trying to match lists where one name is “rosenstein,jonathan” and the other is “rossenstein, jon dr”. I’ve never seen much of a breakthrough in this area. It’s pretty frustrating. You’d think with all the work put into things such as cataloging tools that there’d be something brilliant by now.
By the way, congrats on being blocked by Websense. You’ve made the exclusive club!
I’ve always found SoundEx hard to understand. This page has some interesting links:
http://www.dailydoseofexcel.com/archives/2004/06/16/fuzzy-text-match/
I also have a UDF somewhere (can’t locate it at the moment) that returns a percentage indicating how close two words are to each other.
–JP
OK here it is:
'
' returns a number showing % comparison between two names
'
' i.e. =Str_Comp(A1, B1)
'
' Format cell as Percentage to make it look pretty!!
'
'
Dim MtchTbl(100, 100)
Dim MyMax As Double, ThisMax As Double
Dim i As Integer, j As Integer, ii As Integer, jj As Integer
With WorksheetFunction
st1$ = Trim$(.Proper(st1$))
st2$ = Trim$(.Proper(st2$))
End With
MyMax# = 0
For i% = Len(st1$) To 1 Step -1
For j% = Len(st2$) To 1 Step -1
If Mid$(st1$, i%, 1) = Mid$(st2$, j%, 1) Then
ThisMax# = 0
For ii% = (i% + 1) To Len(st1$)
For jj% = (j% + 1) To Len(st2$)
If MtchTbl(ii%, jj%)> ThisMax# Then
ThisMax# = MtchTbl(ii%, jj%)
End If
Next jj%
Next ii%
MtchTbl(i%, j%) = ThisMax# + 1
If (ThisMax# + 1)> ThisMax# Then
MyMax# = ThisMax# + 1
End If
End If
Next j%
Next i%
Str_Comp = MyMax# / ((Len(st1$) + Len(st2$)) / 2)
End Function
And here's another link to fuzzy lookup code:
http://hairyears.livejournal.com/115867.html
Thanks JP,
Can see you've done a bit of work in this area too! I've found a few more methods flotting around, so i will update those in to that SS some, I think unless you're an expert with these things then it's all a bit trial and error!
Just FTR, I didn't write that UDF. It's extremely useful, if you happen to need the functions.
Thx,
JP
Oh and here's another one:
http://chandoo.org/wp/2008/09/25/handling-spelling-mistakes-in-excel-fuzzy-search/