Implement search suggestion using Levenshtein distance

Posted: October 26, 2011 in .Net/C#, Database
Tags: , , , , , ,

Language: C#; SQL Server: Microsoft SQL Server 2008

I am working on a online store with 6M+ products. The search engine is powered by fulltext, which is good enough for normal scenario; however, a search engine without search suggestion is only partially functioned.

Levenshtein distance is a very useful and powerful algorithm for string difference measuring. I found the implementation in this page. http://levenshtein.blogspot.com/

To implement it for search suggestion, the prerequisite is that you already have a pool of valid keywords, which is true in my case.

First, create a class library in visual studio.

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

public partial class StoredFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true, IsPrecise = false)]
    public static SqlDouble Levenshtein(SqlString S1, SqlString S2)
    {
        if (S1.IsNull)
            S1 = new SqlString("");

        if (S2.IsNull)
            S2 = new SqlString("");

        String SC1 = S1.Value.ToUpper();
        String SC2 = S2.Value.ToUpper();

        int n = SC1.Length;
        int m = SC2.Length;

        int[,] d = new int[n + 1, m + 1];
        int cost = 0;

        if (n + m == 0)
        {
            return 100;
        }
        else if (n == 0)
        {
            return 0;
        }
        else if (m == 0)
        {
            return 0;
        }

        for (int i = 0; i <= n; i++)
            d[i, 0] = i;

        for (int j = 0; j <= m; j++)
            d[0, j] = j;

        for (int i = 1; i <= n; i++)
        {
            for (int j = 1; j <= m; j++)
            {
                if (SC1[i - 1] == SC2[j - 1])
                    cost = 0;
                else
                    cost = 1;

                d[i, j] = System.Math.Min(System.Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1), d[i - 1, j - 1] + cost);
            }
        }

        double percentage = System.Math.Round((1.0 - ((double)d[n, m] / (double)System.Math.Max(n, m))) * 100.0, 2);
        return percentage;
    }
}

Next install the compiled dll to database, under Programmability->Assembly. After that, we will see the newly installed assembly UserFunctions.

We need enable CLR before calling function in the dll.

sp_configure 'clr enabled', 1
GO
reconfigure
GO

Create a sql function to use the function in our dll.

CREATE Function fn_Levenshtein(@S1 nvarchar(4000), @S2 nvarchar(4000))
    RETURNS float as EXTERNAL NAME UserFunctions.StoredFunctions.Levenshtein
GO

Test: select fn_Levenshtein(‘steve jobs’, ‘steve jbs’)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s