假设我有一个字符串
String test = "This is a test string and I have some stopwords in here";
Run Code Online (Sandbox Code Playgroud)
我想看看下面数组中的单词与我的字符串匹配多少次
psudocode
array = a,and,the,them,they,I
Run Code Online (Sandbox Code Playgroud)
所以答案是"3"
只是好奇在java中最有效的方法是什么?
我可能会将输入中的单词存储到HashSet中,然后迭代数组,看看数组中的每个单词是否都是.contains.
这是代码......输入是" 80天环游世界 ".
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Scanner;
import java.util.Set;
public class Main
{
public static void main(final String[] argv)
throws FileNotFoundException
{
final File file;
final String[] wordsToFind;
file = new File(argv[0]);
wordsToFind = getWordsToFind(file);
a(file, wordsToFind);
b(file, wordsToFind);
c(file, wordsToFind);
d(file, wordsToFind);
}
// this just reads the file into the disk cache
private static String[] getWordsToFind(final File file)
throws FileNotFoundException
{
final Scanner scanner;
final Set<String> words;
scanner = new Scanner(file);
words = new HashSet<String>();
while(scanner.hasNext())
{
final String word;
word = scanner.next();
words.add(word);
}
return (words.toArray(new String[words.size()]));
}
// bad way, read intpo a list and then iterate over the list until you find a match
private static void a(final File file,
final String[] wordsToFind)
throws FileNotFoundException
{
final long start;
final long end;
final long total;
final Scanner scanner;
final List<String> words;
int matches;
scanner = new Scanner(file);
words = new ArrayList<String>();
while(scanner.hasNext())
{
final String word;
word = scanner.next();
words.add(word);
}
start = System.nanoTime();
{
matches = 0;
for(final String wordToFind : wordsToFind)
{
for(final String word : words)
{
if(word.equals(wordToFind))
{
matches++;
break;
}
}
}
System.out.println(matches);
}
end = System.nanoTime();
total = end - start;
System.out.println("a: " + total);
}
// slightly better way, read intpo a list and then iterate over the set (which reduces the number of things you progbably
// have to read until you find a match), until you find a match
private static void b(final File file,
final String[] wordsToFind)
throws FileNotFoundException
{
final long start;
final long end;
final long total;
final Scanner scanner;
final Set<String> words;
int matches;
scanner = new Scanner(file);
words = new HashSet<String>();
while(scanner.hasNext())
{
final String word;
word = scanner.next();
words.add(word);
}
start = System.nanoTime();
{
matches = 0;
for(final String wordToFind : wordsToFind)
{
for(final String word : words)
{
if(word.equals(wordToFind))
{
matches++;
break;
}
}
}
System.out.println(matches);
}
end = System.nanoTime();
total = end - start;
System.out.println("b: " + total);
}
// my way
private static void c(final File file,
final String[] wordsToFind)
throws FileNotFoundException
{
final long start;
final long end;
final long total;
final Scanner scanner;
final Set<String> words;
int matches;
scanner = new Scanner(file);
words = new HashSet<String>();
while(scanner.hasNext())
{
final String word;
word = scanner.next();
words.add(word);
}
start = System.nanoTime();
{
matches = 0;
for(final String wordToFind : wordsToFind)
{
if(words.contains(wordToFind))
{
matches++;
}
}
System.out.println(matches);
}
end = System.nanoTime();
total = end - start;
System.out.println("c: " + total);
}
// Nikita Rybak way
private static void d(final File file,
final String[] wordsToFind)
throws FileNotFoundException
{
final long start;
final long end;
final long total;
final Scanner scanner;
final Set<String> words;
int matches;
scanner = new Scanner(file);
words = new HashSet<String>();
while(scanner.hasNext())
{
final String word;
word = scanner.next();
words.add(word);
}
start = System.nanoTime();
{
words.retainAll(new HashSet<String>(Arrays.asList(wordsToFind)));
matches = words.size();
System.out.println(matches);
}
end = System.nanoTime();
total = end - start;
System.out.println("d: " + total);
}
}
Run Code Online (Sandbox Code Playgroud)
结果(经过几次运行后,每次运行几乎都是相同的):
12596
a: 2440699000
12596
b: 2531635000
12596
c: 4507000
12596
d: 5597000
Run Code Online (Sandbox Code Playgroud)
如果你通过在getWordsToFind中的每个单词中添加"XXX"来修改它(所以没有找到单词),你得到:
0
a: 7415291000
0
b: 4688973000
0
c: 2849000
0
d: 7981000
Run Code Online (Sandbox Code Playgroud)
而且,为了完整起见,我试着搜索单词"I",结果如下:
1
a: 235000
1
b: 351000
1
c: 75000
1
d: 10725000
Run Code Online (Sandbox Code Playgroud)
像这样的东西?不确定'最有效',但很简单.
Set<String> s1 = new HashSet<String>(Arrays.asList("This is a test string and I have some stopwords in here".split("\\s")));
Set<String> s2 = new HashSet<String>(Arrays.asList("a", "and", "the", "them", "they", "I"));
s1.retainAll(s2);
System.out.println(s1.size());
Run Code Online (Sandbox Code Playgroud)
只是两组词的交集.
| 归档时间: |
|
| 查看次数: |
2284 次 |
| 最近记录: |