如何在std :: vector <strings>中查找重复项并返回一个std :: list,它们按字母顺序排序,不会在结果列表中重复

Question

如何在std :: vector <strings>中查找重复项并返回一个std :: list,它们按字母顺序排序,不会在结果列表中重复

我有一个名为Wordd的类,它有一个成员word_,它是一个std :: list

我试图在该word_中找到重复项,并返回一个按字母顺序排列的列表,在返回的列表中没有重复项.到目前为止,我的代码编译和链接,但超时,可能是由于一些内部内存泄漏等.

class FindDuplicatesFunctor
{
public:
    std::list<std::string> list;
    std::vector<std::string> word_;
    FindDuplicatesFunctor(std::vector<std::string> words): list(0), word_(words){};
    void operator()(std::string const& str)
    {

        if(std::count(words_.begin(), words_.end(), str) > 1 && std::count(list.begin(), list.end(), str) == 0)
        {
            list.push_back(str);
        }
        list.sort();

    }
};
std::list<string> Wordd::FindDuplicates() const
{
    FindDuplicatesFunctor cf(word_);
    return std::for_each(words_.begin(), words_.end(), cf).list;
}

Run Code Online (Sandbox Code Playgroud)

任何想法为什么它没有执行其任务？

预先感谢您的帮助!

Answer 1

seh*_*ehe 5

编辑回复评论:

^{删除重复项功能名称具有误导性,它实际上是在尝试返回序列中重复的单词列表,但该结果列表只有每个副本的一个副本 - user2624236 10小时前}

我暗示了std::sort+ std::adjacent_find(... std::equal_to<>).这是实现:

template <typename C, typename T = typename C::value_type> std::list<T> adjacent_search(C input)
{
    std::sort(begin(input), end(input));

    static const auto eq = std::equal_to<T>{};
    static const auto neq= std::not2(eq);

    std::list<T> dupes;

    auto end_streak = begin(input);
    auto dupe_at    = std::adjacent_find(end_streak, end(input), eq);

    for(auto end_streak=begin(input);
        (dupe_at = std::adjacent_find(end_streak, end(input), eq)) != end(input);
        end_streak = std::adjacent_find(dupe_at, end(input), neq))
    {
        dupes.insert(dupes.end(), *dupe_at);
    }

    return dupes;
}

Run Code Online (Sandbox Code Playgroud)

此实现具有几个很好的属性,例如线性扫描和合理的最坏情况行为(例如,如果输入包含单个值的1000个重复,则不会执行1001次无用搜索).

但是,以下(使用集合)可能更简单:

// simple, but horrific performance
template <typename C, typename T = typename C::value_type> std::list<T> simple(C const& input)
{
    std::set<T> dupes; // optimization, dupes.find(x) in O(log n)
    for (auto it = begin(input); it != end(input); ++it)
    {
        if ((end(dupes) == dupes.find(*it))) // optimize by reducing find() calls
         && (std::count(it, end(input), *it) > 1))
        {
            dupes.insert(dupes.end(), *it);
        }
    }

    return {begin(dupes), end(dupes)};
}

Run Code Online (Sandbox Code Playgroud)

这几乎肯定会在较小的集合上表现更好,因为复制较少(结果除外).由于隐式线性搜索,它可能会得到相当糟糕的最坏情况行为(对于大输入)std::count.

我建议你std::set<T>直接返回,而不是将其复制到列表中.

这是一个运行Live on Coliru的测试,显示两个版本.

原始答案

现在已经过时了,因为它不符合OP的要求:

#include <vector>
#include <iostream>
#include <algorithm>
#include <iterator>

int main()
{
    std::vector<std::string> input = { "unsorted", "containing", "optional", "unsorted", "duplicate", "duplicate", "values" };

    std::sort(begin(input), end(input));

    std::unique_copy(begin(input), end(input), std::ostream_iterator<std::string>(std::cout, " "));

    std::cout << "\n";
}

Run Code Online (Sandbox Code Playgroud)

输出:

containing duplicate optional unsorted values

Run Code Online (Sandbox Code Playgroud)

现场观看:http://coliru.stacked-crooked.com/view？id = f8cc78dbcce62ad276691b6541629a70-542192d2d8aca3c820c7acc656fa0c68

归档时间：	12 年，3 月前
查看次数：	441 次
最近记录：	12 年，2 月前