Dar*_*rov 33
对于小文件:
string[] lines = File.ReadAllLines("filename.txt");
File.WriteAllLines("filename.txt", lines.Distinct().ToArray());
Run Code Online (Sandbox Code Playgroud)
Jon*_*eet 20
这应该做(并将与大文件一起复制).
请注意,它只删除重复的连续行,即
a
b
b
c
b
d
Run Code Online (Sandbox Code Playgroud)
将最终成为
a
b
c
b
d
Run Code Online (Sandbox Code Playgroud)
如果你不想在任何地方重复,你需要保留一组你已经看过的行.
using System;
using System.IO;
class DeDuper
{
static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: DeDuper <input file> <output file>");
return;
}
using (TextReader reader = File.OpenText(args[0]))
using (TextWriter writer = File.CreateText(args[1]))
{
string currentLine;
string lastLine = null;
while ((currentLine = reader.ReadLine()) != null)
{
if (currentLine != lastLine)
{
writer.WriteLine(currentLine);
lastLine = currentLine;
}
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
请注意,这假定Encoding.UTF8您要使用文件.尽管如此,很容易将其概括为一种方法:
static void CopyLinesRemovingConsecutiveDupes
(TextReader reader, TextWriter writer)
{
string currentLine;
string lastLine = null;
while ((currentLine = reader.ReadLine()) != null)
{
if (currentLine != lastLine)
{
writer.WriteLine(currentLine);
lastLine = currentLine;
}
}
}
Run Code Online (Sandbox Code Playgroud)
(请注意,这不会关闭任何内容 - 调用者应该这样做.)
这是一个将删除所有重复项的版本,而不仅仅是连续的副本:
static void CopyLinesRemovingAllDupes(TextReader reader, TextWriter writer)
{
string currentLine;
HashSet<string> previousLines = new HashSet<string>();
while ((currentLine = reader.ReadLine()) != null)
{
// Add returns true if it was actually added,
// false if it was already there
if (previousLines.Add(currentLine))
{
writer.WriteLine(currentLine);
}
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
21377 次 |
| 最近记录: |