过去在面试时检查我的字谜代码

Question

过去在面试时检查我的字谜代码

以前作为一个面试问题,并且在基本语法上窒息得太厉害,我没能推进(一旦肾上腺素开始,编码就会消失.)

给定一个字符串列表,返回一组字符串列表,这些字符串是输入集的字谜.即"狗","上帝","foo"应该返回{"dog","god"}.之后,我自己创建了代码作为一个完整性检查,它现在已经存在了一段时间.我欢迎对它进行输入,看看我是否遗漏了任何东西,或者我是否可以更有效地完成任务.把它当作改善自己和学习其他技巧的机会:


void Anagram::doWork(list input, list> &output)
{
  typedef list < pair < string, string>> SortType;
  SortType sortedInput;

  // sort each string and pair it with the original
  for(list< string >::iterator i = input.begin(); i != input.end(); ++i)
  {
    string tempString(*i);
    std::sort(tempString.begin(), tempString.end());
    sortedInput.push_back(make_pair(*i, tempString));
  }

  // Now step through the new sorted list
  for(SortType::iterator i = sortedInput.begin(); i != sortedInput.end();)
  {
    set< string > newSet;

    // Assume (hope) we have a match and pre-add the first.
    newSet.insert(i->first);

    // Set the secondary iterator one past the outside to prevent
    // matching the original
    SortType::iterator j = i;
    ++j;

    while(j != sortedInput.end())
    {
      if(i->second == j->second)
      {
        // If the string matches, add it to the set and remove it
        // so that future searches need not worry about it
        newSet.insert(j->first);
        j = sortedInput.erase(j);
      }
      else
      {
        // else, next element
        ++j;
      }
    }

    // If size is bigger than our original push, we have a match 
    // - save it to the output
    if(newSet.size() > 1)
    {
       output.push_back(newSet);
    }

    // erase this element and update the iterator
    i = sortedInput.erase(i);
  }
}



在回顾评论并学习更多内容之后,这是第二次通过:


void doBetterWork(list input, list> &output)
{
  typedef std::multimap< string, string > SortedInputType;
  SortedInputType sortedInput;
  vector< string > sortedNames;

  for(vector< string >::iterator i = input.begin(); i != input.end(); ++i)
  {
    string tempString(*i);
    std::sort(tempString.begin(), tempString.end());
    sortedInput.insert(make_pair(tempString, *i));
    sortedNames.push_back(tempString);
  }

  for(list< string >::iterator i = sortedNames.begin(); i != sortedNames.end(); ++i)
  {
    pair< SortedInputType::iterator,SortedInputType::iterator > bounds;
    bounds = sortedInput.equal_range(*i);

    set< string > newSet;
    for(SortedInputType::iterator j = bounds.first; j != bounds.second; ++j)
    {
      newSet.insert(j->second);
    }

    if(newSet.size() > 1)
    {
      output.push_back(newSet);
    }

    sortedInput.erase(bounds.first, bounds.second);
  }
}

Answer 1

pol*_*nts 14

对字符串进行分组的最佳方法是将字符串映射到某种直方图表示.

 FUNCTION histogram
 [input] -> [output]

 "dog"   -> (1xd, 1xg, 1xo)
 "god"   -> (1xd, 1xg, 1xo)
 "foo"   -> (1xf, 2xo)

Run Code Online (Sandbox Code Playgroud)

基本上,通过线性扫描字符串,您可以生成直方图表示它包含的每个字母的数量.一个小的,有限的字母表使这更容易(例如A-Z,你只有一个26个数字的数组,每个字母一个).

现在,字谜只是具有相同直方图的单词.

然后,您可以使用多图数据结构将直方图映射到具有该直方图的单词列表.

MULTIMAP
[key]           => [set of values]

(1xd, 1xg, 1xo) => { "dog", "god" }
(1xf, 2xo)      => { "foo" }

Run Code Online (Sandbox Code Playgroud)

规范形式的技巧

您也可以使用字符串的"规范形式",而不是处理直方图.基本上,你为每个字符串定义它的规范形式,如果它们具有相同的规范形式,则两个字是字谜.

一种方便的规范形式是按字母顺序排列字符串中的字母.

FUNCTION canonize
[input]  -> [output]

"dog"    -> "dgo"
"god"    -> "dgo"
"abracadabra" -> "aaaaabbcdrr"

Run Code Online (Sandbox Code Playgroud)

请注意,这是统计图方式只需简单的一步:你基本上做计数排序的字母排序.

对于您的问题,这是实际编程语言中最实用的解决方案.

复杂

生成单词的直方图/规范形式实际上是O(1)(有限字母大小,有限最大字长).

有了很好的哈希实现,get而且put在multimap上O(1).

您甚至可以拥有多个多重映射,每个字长一个.

如果有N单词,则将所有单词放入多图中O(N); 然后输出每个anagram组只是将值转储到多图中.这也可以在O(N).

这肯定比检查每对单词是否为字谜(O(N^2)算法)更好.

归档时间：	15 年，7 月前
查看次数：	5059 次
最近记录：	13 年，10 月前