从文本文件中删除重复的行?

Goo*_*ber 8 c# duplicates

给定文本行的输入文件,我想要识别和删除重复的行.请显示一个简单的C#片段来完成此操作.

Dar*_*rov 33

对于小文件:

string[] lines = File.ReadAllLines("filename.txt");
File.WriteAllLines("filename.txt", lines.Distinct().ToArray());
Run Code Online (Sandbox Code Playgroud)


Jon*_*eet 20

这应该做(并将与大文件一起复制).

请注意,它只删除重复的连续行,即

a
b
b
c
b
d
Run Code Online (Sandbox Code Playgroud)

将最终成为

a
b
c
b
d
Run Code Online (Sandbox Code Playgroud)

如果你不想在任何地方重复,你需要保留一组你已经看过的行.

using System;
using System.IO;

class DeDuper
{
    static void Main(string[] args)
    {
        if (args.Length != 2)
        {
            Console.WriteLine("Usage: DeDuper <input file> <output file>");
            return;
        }
        using (TextReader reader = File.OpenText(args[0]))
        using (TextWriter writer = File.CreateText(args[1]))
        {
            string currentLine;
            string lastLine = null;

            while ((currentLine = reader.ReadLine()) != null)
            {
                if (currentLine != lastLine)
                {
                    writer.WriteLine(currentLine);
                    lastLine = currentLine;
                }
            }
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

请注意,这假定Encoding.UTF8您要使用文件.尽管如此,很容易将其概括为一种方法:

static void CopyLinesRemovingConsecutiveDupes
    (TextReader reader, TextWriter writer)
{
    string currentLine;
    string lastLine = null;

    while ((currentLine = reader.ReadLine()) != null)
    {
        if (currentLine != lastLine)
        {
            writer.WriteLine(currentLine);
            lastLine = currentLine;
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

(请注意,这不会关闭任何内容 - 调用者应该这样做.)

这是一个将删除所有重复项的版本,而不仅仅是连续的副本:

static void CopyLinesRemovingAllDupes(TextReader reader, TextWriter writer)
{
    string currentLine;
    HashSet<string> previousLines = new HashSet<string>();

    while ((currentLine = reader.ReadLine()) != null)
    {
        // Add returns true if it was actually added,
        // false if it was already there
        if (previousLines.Add(currentLine))
        {
            writer.WriteLine(currentLine);
        }
    }
}
Run Code Online (Sandbox Code Playgroud)