Ken*_*ama 7 c# wpf dictionary tolower
我正在阅读文档,并拆分单词以获取字典中的每个单词,但我怎么能排除一些单词(如"/ a/an").
这是我的功能:
private void Splitter(string[] file)
{
try
{
tempDict = file
.SelectMany(i => File.ReadAllLines(i)
.SelectMany(line => line.Split(new[] { ' ', ',', '.', '?', '!', }, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Distinct())
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
catch (Exception ex)
{
Ex(ex);
}
}
Run Code Online (Sandbox Code Playgroud)
此外,在这种情况下,在哪里添加.ToLower()调用以使文件中的所有单词都是小写的正确位置?在(temp = file..)之前我正在考虑这样的事情:
file.ToList().ConvertAll(d => d.ToLower());
Run Code Online (Sandbox Code Playgroud)
您想过滤掉停用词吗?
HashSet<String> StopWords = new HashSet<String> {
"a", "an", "the"
};
...
tempDict = file
.SelectMany(i => File.ReadAllLines(i)
.SelectMany(line => line.Split(new[] { ' ', ',', '.', '?', '!', }, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower()) // <- To Lower case
.Where(word => !StopWords.Contains(word)) // <- No stop words
.Distinct()
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
Run Code Online (Sandbox Code Playgroud)
然而,这段代码只是一个部分解决方案:像Berlin这样的专有名称将被转换为小写:berlin以及缩写词:KISS(Keep It Simple,Stupid)将变成只是一个Kiss,并且某些数字将不正确。