删除两个向量中的重复项

Iro*_*ing 2 c++ algorithm vector duplicates std-pair

我将尝试使用以下示例解释我的问题:

vector<pair<string, string>> a = { { "A","1" }, {"B","2" },{ "C","3" },{ "D","3" },{ "E","5" } };

vector<pair<string, string>> b = { { "A","1" },{ "B","3" },{ "D","3" },{ "E","4" },{ "Z","5" } };
Run Code Online (Sandbox Code Playgroud)

什么是最有效的方法来擦除重复并将输出到相同的向量?对的数量非常大,比如大约10万.

两个向量都按第一个元素排序.

vector<pair<string, string>> a = { { "B","2" },{ "C","3" },{ "E","5" } };

vector<pair<string, string>> b = { { "B","3" },{ "E","4" },{ "Z","5" } };
Run Code Online (Sandbox Code Playgroud)

问题是,我需要在删除重复项后比较这些向量.该对中的第一个元素是文件路径,第二个元素是它的校验和.因此,例如,如果我"B","2"在第一个容器中,并且"B","3"是第二个,我可以将此文件列为"已修改".std::set如果这会使这个问题更容易,我愿意使用.

gch*_*hen 7

使用运行索引将为您提供O(len(a)+ len(b))时间复杂度和O(1)额外空间(给定ab已经排序)

void removeDuplicate(vector<pair<string, string>>& a, vector<pair<string, string>>& b) {

    //Add these two lines if there can be duplicates in a or b themselves.
    //a.erase(std::unique(a.begin(), a.end()), a.end());
    //b.erase(std::unique(b.begin(), b.end()), b.end());

    size_t i = 0;
    size_t j = 0;

    size_t p1 = 0;
    size_t p2 = 0;

    while(i < a.size() && j < b.size()) {
        if(a[i] == b[j]) {
            i++;
            j++;
        } else if (a[i] > b[j]) {
            b[p2++] = b[j++]; 
        } else if (b[j] > a[i]) {
            a[p1++] = a[i++];
        }
    }

    while(i < a.size()) {
        a[p1++] = a[i++]; 
    }

    while(j < b.size()) {
        b[p2++] = b[j++];
    }

    a.erase(a.begin()+p1, a.end());
    b.erase(b.begin()+p2, b.end());
}
Run Code Online (Sandbox Code Playgroud)