如何从像grep这样的java 8流中匹配前后获取行?

Sea*_*yen 7 java java-8 java-stream

我有一个文本文件,其中有很多字符串行.如果我想在grep中匹配前后找到行,我会这样做:

grep -A 10 -B 10 "ABC" myfile.txt
Run Code Online (Sandbox Code Playgroud)

如何使用流在Java 8中实现等效?

Luk*_*der 5

如果您愿意使用第三方库而不需要并行性,那么jOOλ提供SQL风格的窗口函数如下

Seq.seq(Files.readAllLines(Paths.get(new File("/path/to/Example.java").toURI())))
   .window(-1, 1)
   .filter(w -> w.value().contains("ABC"))
   .forEach(w -> {
       System.out.println("-1:" + w.lag().orElse(""));
       System.out.println(" 0:" + w.value());
       System.out.println("+1:" + w.lead().orElse(""));
       // ABC: Just checking
   });
Run Code Online (Sandbox Code Playgroud)

生产

-1:       .window(-1, 1)
 0:       .filter(w -> w.value().contains("ABC"))
+1:       .forEach(w -> {
-1:           System.out.println("+1:" + w.lead().orElse(""));
 0:           // ABC: Just checking
+1:       });
Run Code Online (Sandbox Code Playgroud)

lead()函数从窗口以遍历顺序访问下一个值,该lag()函数访问前一行.

免责声明:我为jOOλ背后的公司工作


Tag*_*eev 2

Stream API 不能很好地支持这种情况,因为现有方法不提供对流中元素邻居的访问。我能想到的最接近的解决方案,无需创建自定义迭代器/分割器和第三方库调用,就是将输入文件读入List,然后使用索引流:

List<String> input = Files.readAllLines(Paths.get(fileName));
Predicate<String> pred = str -> str.contains("ABC");
int contextLength = 10;

IntStream.range(0, input.size()) // line numbers
    // filter them leaving only numbers of lines satisfying the predicate
    .filter(idx -> pred.test(input.get(idx))) 
    // add nearby numbers
    .flatMap(idx -> IntStream.rangeClosed(idx-contextLength, idx+contextLength))
    // remove numbers which are out of the input range
    .filter(idx -> idx >= 0 && idx < input.size())
    // sort numbers and remove duplicates
    .distinct().sorted()
    // map to the lines themselves
    .mapToObj(input::get)
    // output
    .forEachOrdered(System.out::println);
Run Code Online (Sandbox Code Playgroud)

grep 输出还包括特殊分隔符,用于"--"指定省略的行。如果您想进一步模仿这种行为,我可以建议您尝试我的免费StreamEx库,因为它的intervalMap方法在这种情况下很有帮助:

// Same as IntStream.range(...).filter(...) steps above
IntStreamEx.ofIndices(input, pred)
    // same as above
    .flatMap(idx -> IntStream.rangeClosed(idx-contextLength, idx+contextLength))
    // remove numbers which are out of the input range
    .atLeast(0).less(input.size())
    // sort numbers and remove duplicates
    .distinct().sorted()
    .boxed()
    // merge adjacent numbers into single interval and map them to subList
    .intervalMap((i, j) -> (j - i) == 1, (i, j) -> input.subList(i, j + 1))
    // flatten all subLists prepending them with "--"
    .flatMap(list -> StreamEx.of(list).prepend("--"))
    // skipping first "--"
    .skip(1)
    .forEachOrdered(System.out::println);
Run Code Online (Sandbox Code Playgroud)