复杂字符串比较

Lac*_*cD. 4 php

我正在尝试在PHP中编写一个函数,它接受一个字符串数组(needle)并执行与另一个字符串数组(haystack)的比较.此函数的目的是为AJAX搜索快速提供匹配的字符串,因此需要尽可能快.

这里有一些示例代码来说明这两个数组;

$needle = array('ba','hot','resta');

$haystack = array(
    'Southern Hotel',
    'Grange Restaurant & Hotel',
    'Austral Hotel',
    'Barsmith Hotel',
    'Errestas'
);
Run Code Online (Sandbox Code Playgroud)

虽然这本身很容易,但比较的目的是计算有多少needle字符串出现在haystack.

但是,有三个限制;

  1. 比较不区分大小写
  2. needle必须只在单词的开头的字符匹配.例如,"hote"将匹配"Hotel",但"resta"将不匹配"Errestas".
  3. 我们想要计算匹配needles的数量,而不是needle出现次数.如果一个地方被命名为"酒店宾馆酒店",我们需要的结果1不是3.

使用上面的例子,我们期望得到以下关联数组:

$haystack = array(
    'Southern Hotel' => 1,
    'Grange Restaurant & Hotel' => 2,
    'Austral Hotel' => 1,
    'Barsmith Hotel' => 2,
    'Erresta'  => 0
);
Run Code Online (Sandbox Code Playgroud)

我一直在尝试实现一个函数来执行此操作,使用一个preg_match_all()看起来像的正则表达式/(\A|\s)(ba|hot|resta)/.虽然这确保我们只匹配单词的开头,但它没有考虑包含相同needle两次的字符串.

我发帖看看别人是否有解决方案?

Ion*_*tan 7

我发现你对问题的描述足够详细,我可以采用TDD方法来解决它.因此,因为我非常想成为一名TDD人,所以我编写了测试和函数来使测试通过.Namings可能并不完美,但它们很容易改变.函数的算法也可能不是最好的,但是现在有了测试,重构应该非常简单和轻松.

以下是测试:

class MultiMatcherTest extends PHPUnit_Framework_TestCase
{
    public function testTheComparisonIsCaseInsensitive()
    {
        $needles  = array('hot');
        $haystack = array('Southern Hotel');
        $result   = match($needles, $haystack);

        $this->assertEquals(array('Southern Hotel' => 1), $result);
    }

    public function testNeedleMatchesOnlyCharsAtBeginningOfWord()
    {
        $needles  = array('resta');
        $haystack = array('Errestas');
        $result   = match($needles, $haystack);

        $this->assertEquals(array('Errestas' => 0), $result);
    }

    public function testMatcherCountsNeedlesNotOccurences()
    {
        $needles  = array('hot');
        $haystack = array('Southern Hotel', 'Grange Restaurant & Hotel');
        $expected = array('Southern Hotel'            => 1,
                          'Grange Restaurant & Hotel' => 1);
        $result   = match($needles, $haystack);

        $this->assertEquals($expected, $result);
    }

    public function testAcceptance()
    {
        $needles  = array('ba','hot','resta');
        $haystack = array(
            'Southern Hotel',
            'Grange Restaurant & Hotel',
            'Austral Hotel',
            'Barsmith Hotel',
            'Errestas',
        );
        $expected = array(
            'Southern Hotel'            => 1,
            'Grange Restaurant & Hotel' => 2,
            'Austral Hotel'             => 1,
            'Barsmith Hotel'            => 2,
            'Errestas'                  => 0,
        );

        $result = match($needles, $haystack);

        $this->assertEquals($expected, $result);
    }
}
Run Code Online (Sandbox Code Playgroud)


这是功能:

function match($needles, $haystack)
{
    // The default result will containg 0 (zero) occurences for all $haystacks
    $result = array_combine($haystack, array_fill(0, count($haystack), 0));

    foreach ($needles as $needle) {

        foreach ($haystack as $subject) {
            $words = str_word_count($subject, 1); // split into words

            foreach ($words as $word) {
                if (stripos($word, $needle) === 0) {
                    $result[$subject]++;

                    break;
                }
            }
        }
    }

    return $result;
}
Run Code Online (Sandbox Code Playgroud)


测试该break声明是否必要

以下测试显示何时break需要.breakmatch函数内部使用和不使用语句运行此测试.

/**
 * This test demonstrates the purpose of the BREAK statement in the
 * implementation function. Without it, the needle will be matched twice.
 * "hot" will be matched for each "Hotel" word.
 */
public function testMatcherCountsNeedlesNotOccurences2()
{
    $needles  = array('hot');
    $haystack = array('Southern Hotel Hotel');
    $expected = array('Southern Hotel Hotel' => 1);
    $result   = match($needles, $haystack);

    $this->assertEquals($expected, $result);
}
Run Code Online (Sandbox Code Playgroud)