字符串匹配算法全接触：带您进入字符串世界的奥秘！

2023-03-12 03:41:57

文本世界中的字符串匹配：解锁信息海洋的利器

字符串匹配算法：文本处理的基石

在浩瀚的文本海洋中，字符串匹配算法如同灯塔，指引我们精准快速地定位所需信息。这些算法是计算机科学中的基石，广泛应用于各种领域，从日常文档搜索到软件开发和生物信息学。

字符串匹配算法的秘密武器：部分匹配表

像一位经验丰富的侦探，字符串匹配算法通过分析模式字符串（我们要查找的字符串）和文本字符串（待查找的字符串）来找出匹配项。它们利用一个巧妙的策略：部分匹配表。该表记录了模式字符串中不同字符的匹配信息，当文本字符串与模式字符串不匹配时，算法会根据部分匹配表跳过不匹配的部分，节省大量搜索时间。

KMP 算法：暴力破解的优雅升级

KMP 算法，因其发明者 Knuth、Morris 和 Pratt 而得名，是一种高效的字符串匹配算法。它巧妙地利用部分匹配表，在模式字符串中创建失败函数，当不匹配发生时，算法可以跳过文本字符串中相应长度的部分，大大提高搜索速度。

def kmp_search(text, pattern):
    n = len(text)
    m = len(pattern)
    fail = [0] * m
    j = 0

    for i in range(1, m):
        while j > 0 and pattern[i] != pattern[j]:
            j = fail[j - 1]

        if pattern[i] == pattern[j]:
            j += 1
            fail[i] = j

    i = 0
    j = 0
    while i < n:
        if pattern[j] == text[i]:
            i += 1
            j += 1

        if j == m:
            return i - j

        elif i < n and pattern[j] != text[i]:
            if j != 0:
                j = fail[j - 1]
            else:
                i += 1

    return -1

Boyer-Moore 算法：逆向思维的突破

Boyer-Moore 算法，由 Robert S. Boyer 和 J. Strother Moore 提出，采用了一种独特的逆向思维方式。它从文本字符串的末尾开始比较，当不匹配发生时，算法会根据模式字符串中字符之间的差异计算出一个跳跃距离，直接跳过不匹配的部分，特别适用于文本字符串中存在大量重复字符的情况。

def boyer_moore_search(text, pattern):
    n = len(text)
    m = len(pattern)
    bad_char_table = [m] * 256

    for i in range(m - 1):
        bad_char_table[ord(pattern[i])] = m - i - 1

    good_suffix_table = [m] * m
    last_prefix_position = m
    for i in range(m - 1):
        if i <= last_prefix_position:
            good_suffix_table[i] = last_prefix_position - i
        else:
            j = m - 1
            while j >= 0 and pattern[j] != pattern[j - i - 1]:
                j -= 1
            if j == -1:
                last_prefix_position = i
            else:
                good_suffix_table[i] = last_prefix_position - j

    i = m - 1
    while i < n:
        j = m - 1
        while j >= 0 and pattern[j] == text[i]:
            j -= 1
            i -= 1

        if j == -1:
            return i + 1

        char = text[i]
        i += max(good_suffix_table[j], bad_char_table[ord(char)])

    return -1