Lev*_*tts 9 java string algorithm hashmap
给定一个字符串列表:
ArrayList<String> strList = new ArrayList<String>();
strList.add("Mary had a little lamb named Willy");
strList.add("Mary had a little ham");
strList.add("Old McDonald had a farm named Willy");
strList.add("Willy had a little dog named ham");
strList.add("(abc)");
strList.add("(xyz)");
strList.add("Visit Target Store");
strList.add("Visit Walmart Store");
Run Code Online (Sandbox Code Playgroud)
HashMap<String, Integer>
prefixMap
这应该产生 a和形式的输出suffixMap
:
字首:
Mary had a -> 2
Mary had a little -> 2
( -> 2
Visit -> 2
Run Code Online (Sandbox Code Playgroud)
后缀:
named Willy -> 2
ham -> 2
) -> 2
Store -> 2
Run Code Online (Sandbox Code Playgroud)
到目前为止,我可以使用以下代码生成列表中所有项目中都存在的前缀:
public static final int INDEX_NOT_FOUND = -1;
public static String getAllCommonPrefixesInList(final String... strs) {
if (strs == null || strs.length == 0) {
return EMPTY_STRING;
}
final int smallestIndexOfDiff = getIndexOfDifference(strs);
if (smallestIndexOfDiff == INDEX_NOT_FOUND) {
// All Strings are identical
if (strs[0] == null) {
return EMPTY_STRING;
}
return strs[0];
} else if (smallestIndexOfDiff == 0) {
// No common initial characters found, return empty String
return EMPTY_STRING;
} else {
// Common initial character sequence found, return sequence
return strs[0].substring(0, smallestIndexOfDiff);
}
}
public static int getIndexOfDifference(final CharSequence... charSequence) {
if (charSequence == null || charSequence.length <= 1) {
return INDEX_NOT_FOUND;
}
boolean isAnyStringNull = false;
boolean areAllStringsNull = true;
final int arrayLen = charSequence.length;
int shortestStrLen = Integer.MAX_VALUE;
int longestStrLen = 0;
// Find the min and max string lengths - avoids having to check that we are not exceeding the length of the string each time through the bottom loop.
for (int i = 0; i < arrayLen; i++) {
if (charSequence[i] == null) {
isAnyStringNull = true;
shortestStrLen = 0;
} else {
areAllStringsNull = false;
shortestStrLen = Math.min(charSequence[i].length(), shortestStrLen);
longestStrLen = Math.max(charSequence[i].length(), longestStrLen);
}
}
// Deals with lists containing all nulls or all empty strings
if (areAllStringsNull || longestStrLen == 0 && !isAnyStringNull) {
return INDEX_NOT_FOUND;
}
// Handle lists containing some nulls or some empty strings
if (shortestStrLen == 0) {
return 0;
}
// Find the position with the first difference across all strings
int firstDiff = -1;
for (int stringPos = 0; stringPos < shortestStrLen; stringPos++) {
final char comparisonChar = charSequence[0].charAt(stringPos);
for (int arrayPos = 1; arrayPos < arrayLen; arrayPos++) {
if (charSequence[arrayPos].charAt(stringPos) != comparisonChar) {
firstDiff = stringPos;
break;
}
}
if (firstDiff != -1) {
break;
}
}
if (firstDiff == -1 && shortestStrLen != longestStrLen) {
// We compared all of the characters up to the length of the
// shortest string and didn't find a match, but the string lengths
// vary, so return the length of the shortest string.
return shortestStrLen;
}
return firstDiff;
}
Run Code Online (Sandbox Code Playgroud)
但是,我的目标是将至少出现的任何前缀/后缀2+
包含到结果映射中。
如何实现这一点Java
?
根据我对这个问题的理解,最适合解决它的数据结构是非循环脱节图。
一般情况下,一个图将由几个不相连的簇组成。每个簇都有一个树状结构,在边缘情况下它将形成一个链表。
基本上,解决这个问题的最简单的方法是根据每一行创建一堆链表,然后迭代它们。缺点是:节点重复(内存消耗更大)、时间复杂度更高(需要更多操作)并且更容易出错,因为需要更多的手动操作。
因此,我将坚持使用图表作为该问题的数据结构,并尝试使事情尽可能简单。
让我们考虑以下输入:
"Mary had a little lamb named Willy"
"Mary had a little ham"
"A B C"
Run Code Online (Sandbox Code Playgroud)
该图的图形表示如下所示;
前两行将构成一个由链表(头部)和树(尾部)组成的簇。第二个簇将由一个链表表示,它的顶点不与其他字符串形成的顶点连接。
这不是构造顶点的唯一方法,头部可以生成N 树,并且可以在中间的某个位置观察到链表。
主要要点是,为了解决这个问题,我们需要通过所有分支跟踪顶点链,直到顶点重叠。在图的这些部分中,两条或多条线中常见的每个前缀字符串和后缀字符串将由单个顶点(节点)表示。
为了维护映射到特定顶点的字符串数量,每个顶点应该有一个变量(int groupCount
在下面的代码中),该变量在创建顶点时分配一个默认值,并在每次映射新字符串时1
递增。到这个顶点。
每个顶点都包含一个映射,其中保存对其邻居的引用。当添加新的邻居顶点Vertex
时,基于给定字符串创建的新顶点或现有顶点的计数会增加。
为了符合此任务,图应维护对所有头顶点和尾顶点的引用。为了简单起见,在这个解决方案中,图实际上由两个图( suffix -graph和前缀图)。因此,named 中的实现类。MultiGraph
为了用顶点填充后缀图和前缀图,方法以正常或相反的顺序addCluster()
迭代给定行的字符串Iterator
,具体取决于正在填充的图。
填充图表后的下一步是生成频率为2
及以上的字符串映射。
为此,正在使用经典的深度优先搜索算法。
为了实现 DFS,需要一个用作堆栈ArrayDeque
的可变容器(正用于此目的)。从正面/反面地图中取出的第一个元素将被放置在堆栈的顶部,并且保存该元素名称的实例StringBuilder
将被放置在地图中。
然后,要恢复具有特定count 的字符串,将从堆栈顶部弹出顶点,并将其具有 count 的邻居依次放置在堆栈顶部。当前前缀的副本以及附加的分隔符和邻居的名称将被映射到neighbour-vertex。> 1
如果计数发生变化,则表明当前前缀表示至少两行之间的最长公共字符串。在这种情况下,前缀和计数将添加到结果映射中。
以下实现由两个类组成,它们是狭隘的和独立的。该类MultiGraph
专门充当数据结构,维护两个图。羽化代码,就像分割字符串的行一样,被提取到一个单独的类中GraphManager
。
图形
"Mary had a little lamb named Willy"
"Mary had a little ham"
"A B C"
Run Code Online (Sandbox Code Playgroud)
下面的类处理处理输入数据的任务:它分割行,负责丢弃可能出现的空字符串,并将输入打包到 a 中Deque
以方便地适应两个方向的迭代。
它还实例化图表并管理其工作。GraphManager
负责为图形提供分隔符,以便在创建结果映射时恢复字符串的初始形状。这样,您可以在空白处分割给定的行,通过空字符串逐个字符地处理行,或者通过标点符号来分割,而无需更改这两个类中的任何一行。
图形管理器
public class MultiGraph {
private final Map<String, Vertex> heads = new HashMap<>();
private final Map<String, Vertex> tails = new HashMap<>();
public void addCluster(Deque<String> names) {
addCluster(heads, names.iterator());
addCluster(tails, names.descendingIterator());
}
private void addCluster(Map<String, Vertex> clusters, Iterator<String> names) {
String rootName = names.next();
if (clusters.containsKey(rootName)) {
clusters.get(rootName).incrementGroupCount();
} else {
clusters.put(rootName, new Vertex(rootName));
}
Vertex current = clusters.get(rootName);
while (names.hasNext()) {
current = current.addNext(names.next());
}
}
public Map<String, Integer> generatePrefixMap(String delimiter) {
Map<String, Integer> countByPrefix = new HashMap<>();
for (Vertex next: heads.values()) {
if (next.getGroupCount() == 1) {
continue;
}
performDFS(heads, countByPrefix, delimiter, next);
}
return countByPrefix;
}
public Map<String, Integer> generateSuffixMap(String delimiter) {
Map<String, Integer> countBySuffix = new HashMap<>();
for (Vertex next: tails.values()) {
if (next.getGroupCount() == 1) {
continue;
}
performDFS(tails, countBySuffix, delimiter, next);
}
return countBySuffix;
}
// implementation of the Depth First Search algorithm
public void performDFS(Map<String, Vertex> clusters,
Map<String, Integer> countByPrefix,
String delimiter, Vertex next) {
StringBuilder prefix = null;
Vertex current = next;
int count = next.getGroupCount();
Deque<Vertex> stack = new ArrayDeque<>(); // create as stack
Map<Vertex, StringBuilder> prefixByVert = new HashMap<>();
stack.push(next); // place the first element on the stack
prefixByVert.put(current, new StringBuilder(current.getName()));
while (!stack.isEmpty()) {
current = stack.pop();
if (current.getGroupCount() < count) { // the number of strings mapped to the current Vertex has been changed
countByPrefix.put(prefix.toString(), count); // saving the result
count = current.getGroupCount();
}
prefix = prefixByVert.get(current);
for (Vertex neighbour: current.getNextVertByVal().values()) {
if (next.getGroupCount() == 1) {
continue;
}
stack.push(neighbour);
prefixByVert.put(neighbour, new StringBuilder(prefix)
.append(delimiter)
.append(neighbour.getName()));
}
}
if (prefix != null && count > 1) {
countByPrefix.putIfAbsent(prefix.toString(), count);
}
}
private static class Vertex {
private final String name;
private int groupCount = 1;
private final Map<String, Vertex> nextVertByVal = new HashMap<>();
public Vertex(String name) {
this.name = name;
}
public Vertex addNext(String value) {
if (nextVertByVal.containsKey(value)) {
nextVertByVal.get(value).incrementGroupCount();
} else {
nextVertByVal.put(value, new Vertex(value));
}
return nextVertByVal.get(value);
}
public void incrementGroupCount() {
this.groupCount++;
}
public String getName() {
return name;
}
public int getGroupCount() {
return groupCount;
}
public Map<String, Vertex> getNextVertByVal() {
return nextVertByVal;
}
}
}
Run Code Online (Sandbox Code Playgroud)
主要的()
public static void main(String[] args) {
List<String> lines = List.of(
"Mary had a little lamb named Willy", "Mary had a little ham",
"Old McDonald had a farm named Willy", "Willy had a little dog named ham",
"( abc )", "( xyz )", "Visit Target Store", "Visit Walmart Store");
GraphManager gm = GraphManager.getInstance(lines, " ");
System.out.println("Prefixes:");
for (Map.Entry<String, Integer> entry: gm.getPrefixMap().entrySet()) {
System.out.println(entry.getValue() + " " + entry.getKey());
}
System.out.println("\nSuffixes:");
for (Map.Entry<String, Integer> entry: gm.getSuffixMap().entrySet()) {
System.out.println(entry.getValue() + " " + entry.getKey());
}
}
Run Code Online (Sandbox Code Playgroud)
输出
Prefixes:
2 Mary had a little
2 Visit
2 (
Suffixes:
2 ham
2 )
2 Store
2 Willy named
Run Code Online (Sandbox Code Playgroud)