面试问题:查询 - 哪些句子包含短语的所有单词

zca*_*fg2 8 java string algorithm

我已经解决了问题,但无法提出通过所有测试用例的最有效问题.它在5个测试用例中超时.

确定句子包含短语
0的所有单词:克里斯和詹妮弗今天早上打架
1:克里斯去度假
2:詹妮弗在监狱里

查询短语是
0:克里斯詹妮弗
1:詹妮弗
2:监狱

目标是为每个查询找到匹配句子的索引,或者如果不存在匹配句子则为-1.单词顺序无关紧要.

输出:
0
0 2
2

即,第一个查询在句子0中具有匹配的单词,在句子0和1中具有第二个查询,依此类推.

约束

  • n:句子数量
  • m:赞美的数量
  • n,m <10 ^ 4
  • 任何句子或查询短语中的单词数量在[1-10]范围内
  • 每个单词最多有11个字符
  • 超过10个句子中没有单词出现
  • 每个单词仅由大写和小写字母组成
  • 每个单词必须完全匹配 - 即喜欢和喜欢不匹配.

输入格式:

3
克里斯和詹妮弗今天早上打架
克里斯去度假
jennifer在监狱
3
克里斯詹妮弗
詹妮弗
监狱

每个3代表句子或查询的数量.


以下是我试过的......

我的第一个解决方案:

  1. 根据每个句子制作HashMap
  2. 对于短语中的每个分词:
    2-1.检查语句hashmap
    2-2中是否存在所有单词.如果存储索引
    2-3.如果所有句子都没有匹配的句子,则存储-1.
  3. 打印结果

令p =句子中最大字数
让k =查询中最大字数
Big O是O(npk)

public static void textQueries(List<String> sentences, List<String> queries) {
    List<Map<String, Integer>> sentenceMaps = createMaps(sentences);
    String results = queryMatcher(sentenceMaps, queries);
    System.out.println(results);
}


private static String queryMatcher(List<Map<String, Integer>> sentenceMaps, List<String> queries) {
    Map<String, Integer> wordCounter = new LinkedHashMap<>();
    List<List<String>> results = new ArrayList<List<String>>();
    for (String query : queries) {
        List<String> result = new ArrayList<>();
        for (int j = 0; j < sentenceMaps.size(); j++) {
            if (isQueryFound(sentenceMaps.get(j), query, wordCounter)) {
                result.add(j + "");
            }
        }
        results.add(result);
    }
    return generateResultString(results);
}


/*
 * StringBuilder used to reduce delays of calling multiple System.out.println();
 */
private static String generateResultString(List<List<String>> results) {
    StringBuilder stringBuilder = new StringBuilder();
    for (List<String> matchingSentenceIndexes : results) {
        if (matchingSentenceIndexes.isEmpty()) {
            stringBuilder.append("-1\n");
        } else {
            resultStringHelper(matchingSentenceIndexes, stringBuilder);
        }
        //stringBuilder.append("\n");
    }
    return stringBuilder.toString();
}

/*
 * add " " for multiple indexes result
 */
private static void resultStringHelper(List<String> result, StringBuilder stringBuilder) {
    for (int i = 0; i < result.size(); i++) {
        stringBuilder.append(result.get(i));
        if (i < result.size() - 1) {
            stringBuilder.append(" ");
        } else if (i == result.size() - 1) {
            stringBuilder.append("\n");
        }
    }
}
private static boolean isQueryFound(Map<String, Integer> sentenceMap, String query, Map<String, Integer> wordCounter) {
    String[] queryTokens = query.split(" ");
    for (String queryToken : queryTokens) {
        if (isMoreThan10Sentences(wordCounter, queryToken)) return false;
        if (sentenceMap.containsKey(queryToken)) {
            wordCounter.put(queryToken, wordCounter.getOrDefault(queryToken, 0) + 1);
        } else {
            return false;
        }
    }
    return true;
}

private static boolean isMoreThan10Sentences(Map<String, Integer> wordCounter, String token) {
    return wordCounter.getOrDefault(token, -1) > 10;
}

private static Map<String, Integer> initMap(String[] tokens) {
    Map<String, Integer> map = new LinkedHashMap<>();
    for (String token : tokens) {
        map.put(token, 0);
    }
    return map;
}

private static List<Map<String, Integer>> createMaps(List<String> sentences) {
    List<Map<String, Integer>> maps = new ArrayList<Map<String,Integer>>();
    for (int i = 0; i < sentences.size(); i++) {
        String[] tokens = sentences.get(i).split(" ");
        maps.add(initMap(tokens));
    }
    return maps;
}
Run Code Online (Sandbox Code Playgroud)

最近5个测试用例超时.

对于小型测试用例,其在线编码服务器的基准测试如下:
地图创建时间:9.23954E-4
查询匹配时间:3.85751E-4

地图生成很昂贵.


我的第二次尝试:

类似的逻辑但应用了并发性,因为该平台最多支持2个线程.

多线程在这里完成:
1.句子 - >映射生成(并行映射生成)
2.查询匹配(并发匹配)

public static void textQueries(List<String> sentences, List<String> queries) {
    List<Map<String, Integer>> sentenceMaps = createMaps(sentences);
    startTime = System.nanoTime();
    String results = queryMatcher(sentenceMaps, queries);
    System.out.println(results);

private static String queryMatcher(List<Map<String, Integer>> sentenceMaps, List<String> queries) {
    List<Future<String>> futures = new ArrayList<Future<String>>();
    int threads = Runtime.getRuntime().availableProcessors();
    ExecutorService executor = Executors.newFixedThreadPool(threads);
    String[] results = new String[threads];
    int length = queries.size() / threads;
    for (int i = 0; i < threads; i++) {
        int queryStart = length * i;
        int queryEnd = length * (i+1);
        if (i == threads -1 && queries.size() % threads != 0) queryEnd++;
        Callable<String> worker = new QueryMatcher(sentenceMaps, queries, queryStart, queryEnd);
        Future<String> submit = executor.submit(worker);
        futures.add(submit);
    }

    for (int i = 0; i < futures.size(); i++) {
        try {
            results[i] = futures.get(i).get();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (ExecutionException e) {
            e.printStackTrace();
        }
    }
    String returnString = concaString(results);
    executor.shutdown();
    return returnString;
}

private static String concaString(String[] results) {
    StringBuilder stringBuilder = new StringBuilder();
    for (int i = 0; i < results.length; i++) {
        stringBuilder.append(results[i]);
    }
    return stringBuilder.toString();
}

private static String generateResultString(List<List<String>> results) {
    StringBuilder stringBuilder = new StringBuilder();
    for (List<String> matchingSentenceIndexes : results) {
        if (matchingSentenceIndexes.isEmpty()) {
            stringBuilder.append("-1\n");
        } else {
            resultStringHelper(matchingSentenceIndexes, stringBuilder);
        }
        //stringBuilder.append("\n");
    }
    return stringBuilder.toString();
}

private static void resultStringHelper(List<String> result, StringBuilder stringBuilder) {
    for (int i = 0; i < result.size(); i++) {
        stringBuilder.append(result.get(i));
        if (i < result.size() - 1) {
            stringBuilder.append(" ");
        } else if (i == result.size() - 1) {
            stringBuilder.append("\n");
        }
    }
}
private static boolean isQueryFound(Map<String, Integer> sentenceMap, String query, Map<String, Integer> wordCounter) {
    String[] queryTokens = query.split(" ");
    for (String queryToken : queryTokens) {
        if (isMoreThan10Sentences(wordCounter, queryToken)) return false;
        if (sentenceMap.containsKey(queryToken)) {
            wordCounter.put(queryToken, wordCounter.getOrDefault(queryToken, 0) + 1);
        } else {
            return false;
        }
    }
    return true;
}

private static boolean isMoreThan10Sentences(Map<String, Integer> wordCounter, String token) {
    return wordCounter.getOrDefault(token, -1) > 10;
}

private static boolean isQueryFound(Map<String, Integer> sentenceMap, String query) {
    String[] queryTokens = query.split(" ");
    //Map<String, Integer> duplicateChecker = new LinkedHashMap<String, Integer>();

    for (String queryToken : queryTokens) {
        if (sentenceMap.containsKey(queryToken)) {
            //if (!duplicateChecker(duplicateChecker, sentenceMap, queryToken))
            //return false;
        } else {
            return false;
        }
    }
    return true;
}

/*
 * this method checks for the case when there are duplicate words in query
 * i.e. sentence containing 2 hello will return false of queries with 3 hello
 */
private static boolean duplicateChecker(Map<String, Integer> duplicateChecker, Map<String, Integer> sentenceMap, String queryToken) {
    if (duplicateChecker.containsKey(queryToken)) {
        if (duplicateChecker.get(queryToken) == 0) return false;
        duplicateChecker.put(queryToken, duplicateChecker.get(queryToken) - 1);
    } else {
        duplicateChecker.put(queryToken, sentenceMap.get(queryToken) - 1);
    }
    return true;
}

private static List<Map<String, Integer>> createMaps(List<String> sentences) {
    List<Map<String, Integer>> maps = new ArrayList<>();
    int threads = Runtime.getRuntime().availableProcessors();
    ExecutorService executor = Executors.newFixedThreadPool(threads);
    List<Future<List<Map<String, Integer>>>> futures = new ArrayList<Future<List<Map<String, Integer>>>>();
    int length = (sentences.size()) / threads;

    for (int i = 0; i < threads; i++) {
        int start = i * length;
        int end = (i+1) * length;
        if (i == threads - 1 && sentences.size() % threads != 0) end++;
        List<String> splitSentence = new ArrayList(sentences.subList(start, end));

        Callable<List<Map<String, Integer>>> worker = new MapMaker(splitSentence);
        Future<List<Map<String, Integer>>> submit = executor.submit(worker);
        futures.add(submit);
    }

    for (int i = 0; i < futures.size(); i++) {
        try {
            for (Map<String, Integer> map : futures.get(i).get()) {
                maps.add(map);
            }
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (ExecutionException e) {
            e.printStackTrace();
        }
    }
    executor.shutdown();
    return maps;
}

private synchronized static Map<String, Integer> initMap(String[] tokens) {
    Map<String, Integer> map = new LinkedHashMap<>();
    for (String token : tokens) {
        map.put(token, 0);
        //            map.put(token, map.getOrDefault(map.get(token), 1) + 1);
    }
    return map;
}


public static class MapMaker implements Callable<List<Map<String, Integer>>> {
    private List<String> sentences;

    @Override
    public List<Map<String, Integer>> call() throws Exception {
        List<Map<String, Integer>> maps = new ArrayList<Map<String,Integer>>();
        for (int i = 0; i < sentences.size(); i++) {
            String[] tokens = sentences.get(i).split(" ");
            maps.add(initMap(tokens));
        }
        return maps;
    }

    public MapMaker(List<String> sentences) {
        this.sentences = sentences;
    }
}

public static class QueryMatcher implements Callable<String> {
    private List<Map<String, Integer>> sentenceMaps;
    private List<String> queries;
    private int queryStart;
    private int queryEnd;

    @Override
    public String call() throws Exception {
        List<List<String>> results = new ArrayList<List<String>>();
        for (int i = queryStart; i < queryEnd; i++) {
            List<String> result = new ArrayList<>();
            String query = queries.get(i);
            for (int j = 0; j < sentenceMaps.size(); j++) {
                if (isQueryFound(sentenceMaps.get(j), query)) {
                    result.add(j + "");
                }
            }
            results.add(result);
        }
        return generateResultString(results);
    }

    public QueryMatcher(List<Map<String, Integer>> sentenceMaps, List<String> queries, int queryStart, int queryEnd) {
        this.sentenceMaps = sentenceMaps;
        this.queries = queries;
        this.queryStart = queryStart;
        this.queryEnd = queryEnd;
    }
}
Run Code Online (Sandbox Code Playgroud)

虽然我希望对于大型测试用例有一些加速,但它仍然给出了5个测试用例超时.

对于小型测试用例,由于创建池的额外开销,它增加了映射生成时间.

基准时间:
地图时间:0.007669489
查询匹配时间:3.22923E-4


3.我的第三个解决方案 - 用C++编写上述代码

我质疑是否可能是Java提供超时.
该平台实际上为C++提供了更短的计算时间,所以令我惊讶的是,它仍然给出了相同的5次超时.


4.我的第四种方法正则表达式,

我知道它会慢一点,但我仍然徒劳无功.Big O在这里实际上比较慢,因为我需要用单词对每个句子进行排序以避免n!正则表达式的排列...

public static void textQueries(List<String> sentences, List<String> queries) {
    stringSort(sentences);
    stringSort(queries);
    StringBuilder stringBuilder = new StringBuilder();

    boolean isExist = false;
    for (int index = 0; index < queries.size(); index++) {
        String query = queries.get(index);
        isExist = false;
        for (int i = 0; i < sentences.size(); i++) {
            if (Matcher(buildNaturalLanguage(query), sentences.get(i))) {
                stringBuilder.append(i + " ");
                isExist = true;
            }
        }
        if (!isExist) stringBuilder.append("-1");
        if (index != queries.size() - 1) stringBuilder.append("\n");
    }
    System.out.println(stringBuilder.toString());
}

private static void stringSort(List<String> strings) {
    for (int i = 0; i < strings.size(); ++i) {
        String string = strings.get(i);
        String[] stringParts = string.split(" ");
        StringBuilder stringBuilder = new StringBuilder();
        Arrays.sort(stringParts);
        for (int j = 0; j < stringParts.length; j++) {
            stringBuilder.append(stringParts[j] + " ");
        }
        strings.set(i, stringBuilder.toString());  // sure I made it back to string for code cleaness but you can return String[] for efficiency.. But only minor improvement.
    }
}

private static String buildNaturalLanguage(String query) {
    // System.out.println("query " + query);
    String[] stringParts = query.split(" ");
    String regular = "(([a-zA-Z])*(\\s))*";
    for (String word : stringParts) {
        regular += word + "(\\s(([a-zA-Z])*(\\s))*)";
    }
    return regular;
}

private static boolean Matcher(String regular, String sentence) {
    Pattern p = Pattern.compile(regular);
    Matcher m = p.matcher(sentence);
    return m.find();
}
Run Code Online (Sandbox Code Playgroud)

结果:不仅得到超时,它在某种程度上导致错误(错误答案)另外2个未公开的测试用例..我不知道为什么..

Ω(nm ^ 2 + plogp) ..假设正则表达式匹配为O(m)


我只能想到在运行主算法之前过滤一些查询或句子的可能性?(约束:每个单词10个最大匹配).

然而,该约束检查部分仍然用我的第一和第二解决方案实现.因此可能需要更智能的过滤.

事情是我认为BCR - 最好的可想象的速率是O(MNP),你仍然需要经历每个查询和句子,如果不使用正则表达式也分裂它们.

我完全迷失在这里,我怎么能比这更快地提高速度?

提前谢谢了.

Ris*_*hav 5

保持HashMapStrings 映射到的Set<Int>.我们的想法是跟踪给定单词出现的句子.我们使用集合代替数组,以便有效地支持计算两个集合的交集.

对于每个输入句子:

  • 将其标记为单词,并将当前句子的索引添加到与当前标记对应的Set中.

对于每个查询短语:

  • 将其标记为单词.
  • 查询与每个单词对应的索引集
  • 取所有这些集合的交集.

时间复杂度:假设每个句子中有10个单词,构建HashMap的成本为O(10N log N).每个查询的成本是O(10*log(N)).