gawk 或 grep:单行且不贪婪

Ger*_*ica 6 java regex awk grep git-bash

我想*.java递归打印所有子目录中具有两个以上类型参数(即<R ... H>下面示例中的参数)的文件的标题。其中一个文件看起来像(为简洁起见,名称已减少):

多行.java

class ClazzA<R extends A,
    S extends B<T>, T extends C<T>,
    U extends D, W extends E,
    X extends F, Y extends G, Z extends H>
    extends OtherClazz<S> implements I<T> {

  public void method(Type<Q, R> x) { 
    // ... code ...
  }
}
Run Code Online (Sandbox Code Playgroud)

预期输出:

ClazzA.java:10: class ClazzA<R extends A,
ClazzA.java:11:     S extends B<T>, T extends C<T>,
ClazzA.java:12:     U extends D, W extends E,
ClazzA.java:13:     X extends F, Y extends G, Z extends H>
ClazzA.java:14:     extends OtherClazz<S> implements I<T> {
Run Code Online (Sandbox Code Playgroud)

但另一个也可能是这样的:

单行.java

ClazzA.java:10: class ClazzA<R extends A,
ClazzA.java:11:     S extends B<T>, T extends C<T>,
ClazzA.java:12:     U extends D, W extends E,
ClazzA.java:13:     X extends F, Y extends G, Z extends H>
ClazzA.java:14:     extends OtherClazz<S> implements I<T> {
Run Code Online (Sandbox Code Playgroud)

预期输出:

ClazzB.java:42: class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {
Run Code Online (Sandbox Code Playgroud)

不应考虑/打印的文件:

X-无参数.java

class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {

  public void method(Type<Q, R> x) { 
    // ... code ...
  }
}
Run Code Online (Sandbox Code Playgroud)

X一参数.java

ClazzB.java:42: class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {
Run Code Online (Sandbox Code Playgroud)

X-二参数.java

class ClazzC /* no type parameter */ extends OtherClazz<S> implements I<T> {

  public void method(Type<A, B> x) { 
    // ... code ...
  }
}
Run Code Online (Sandbox Code Playgroud)

X-两行参数.java

class ClazzD<R extends A>  // only one type parameter
    extends OtherClazz<S> implements I<T> {

  public void method(Type<X, Y> x) { 
    // ... code ...
  }
}
Run Code Online (Sandbox Code Playgroud)

文件中的所有空格都可以是\s+. extends [...]implements [...]紧接之前{是可选的。extends [...]在每个类型参数中也是可选的。请参阅Java® 语言规范,8.1。类声明的详细信息。

gawkGit Bash 中使用

class ClazzE<R extends A, S extends B<T>>  // only two type parameters
    extends OtherClazz<S> implements I<T> {

  public void method(Type<X, Y> x) { 
    // ... code ...
  }
}
Run Code Online (Sandbox Code Playgroud)

和:

class ClazzF<R extends A,  // only two type parameters
    S extends B<T>>        // on two lines
    extends OtherClazz<S> implements I<T> {

  public void method(Type<X, Y> x) { 
    // ... code ...
  }
}
Run Code Online (Sandbox Code Playgroud)

ws-class-type-parameter.awk

$ gawk --version
GNU Awk 5.0.0, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)
Run Code Online (Sandbox Code Playgroud)

这会找到所有*.java文件......很棒,gawk与每个文件一起执行......很棒,但是在我尝试后你会看到结果作为评论。请注意:这里的ClazzA文字仅用于测试和MCVE。它可能是\w+真实的,但是在测试时在数千个文件中有 500.000 多行...

如果我在regex101.com上尝试它会起作用。嗯,有点。我没有找到如何定义/start-regex/,/end-regex/那里,所以我.*在两者之间添加了另一个。

我从那里拿了标志,但我找不到是否gawk支持标志语法的描述,/.../sU , /.../U所以我只是试了一下。一条现已删除的评论告诉我,没有任何味道awk支持这一点。

我也试过grep

find . -type f -name '*.java' | xargs gawk -f ws-class-type-parameter.awk > ws-class-type-parameter.log
Run Code Online (Sandbox Code Playgroud)

使用types.grep

# /start/ , /end/ ... pattern

#/class ClazzA<.*,.*/      , /{/  {    # 5 lines, OK for ClazzA, but in real it prints classes with 2 or less type parameters, too
#/class ClazzA<.*,.*,/     , /{/  {    # no line with ClazzA, since there's no second ',' on its first line
#/class ClazzA<.*,.*,/s    , /{/  {    # 500.000+(!) lines
#/class ClazzA<.*,.*,/s    , /{/U {    # 500.000+(!) lines
#/class ClazzA<.*,.*,/sU   , /{/U {    # 500.000+(!) lines
 /(?s)class ClazzA<.*,.*,/ , /{/  {    # no line

    match( FILENAME, "/.*/.." )
    print substr( FILENAME, RLENGTH ) ":" FNR ": " $0
}
Run Code Online (Sandbox Code Playgroud)

这只会导致singleline.java 的输出。

(?s)--perl-regexp, -P语法并grep --help声称支持这一点。

更新

Ed Morton 的答案中的解决方案效果很好,但结果是有自动生成的文件,其方法如下:

$ grep --version
grep (GNU grep) 3.1
...
$ grep -nrPf types.grep *.java
Run Code Online (Sandbox Code Playgroud)

这给出了例如的输出:

(?s).*class\s+\w+\s*<.*,.*,.*>.*{
Run Code Online (Sandbox Code Playgroud)

以及其他带有课堂评论和注释的人,例如:

    /** more code before here */    
    public void setId(String value) {
        this.id = value;
    }

    /**
     * Gets a map that contains attributes that aren't bound to any typed property on this class.
     * 
     * <p>
     * the map is keyed by the name of the attribute and 
     * the value is the string value of the attribute.
     * 
     * the map returned by this method is live, and you can add new attribute
     * by updating the map directly. Because of this design, there's no setter.
     * 
     * 
     * @return
     *     always non-null
     */
    public Map<QName, String> getOtherAttributes() {
        return otherAttributes;
    }
Run Code Online (Sandbox Code Playgroud)

输出为例如:

AbstractAddressType.java:81:      * Gets a map that contains attributes that aren't bound to any typed property on this class.
AbstractAddressType.java:82:      * 
AbstractAddressType.java:83:      * <p>
AbstractAddressType.java:84:      * the map is keyed by the name of the attribute and 
AbstractAddressType.java:85:      * the value is the string value of the attribute.
AbstractAddressType.java:86:      * 
AbstractAddressType.java:87:      * the map returned by this method is live, and you can add new attribute
AbstractAddressType.java:88:      * by updating the map directly. Because of this design, there's no setter.
AbstractAddressType.java:89:      * 
AbstractAddressType.java:90:      * 
AbstractAddressType.java:91:      * @return
AbstractAddressType.java:92:      *     always non-null
AbstractAddressType.java:93:      */
AbstractAddressType.java:94:     public Map<QName, String> getOtherAttributes() {
Run Code Online (Sandbox Code Playgroud)

Ed *_*ton 10

在每个 UNIX 机器上的任何 shell 中使用任何 POSIX awk:

$ cat tst.awk
/[[:space:]]*class[[:space:]]*/ {
    inDef = 1
    fname = FILENAME
    sub(".*/","",fname)
    def = out = ""
}
inDef {
    out = out fname ":" FNR ": " $0 ORS

    # Remove comments (not perfect but should work for 99.9% of cases)
    sub("//.*","")
    gsub("/[*]|[*]/","\n")
    gsub(/\n[^\n]*\n/,"")

    def = def $0 ORS
    if ( /{/ ) {
        if ( gsub(/,/,"&",def) > 2 ) {
            printf "%s", out
        }
        inDef = 0
    }
}
Run Code Online (Sandbox Code Playgroud)

$ find tmp -type f -name '*.java' -exec awk -f tst.awk {} +
multiple-lines.java:1: class ClazzA<R extends A,
multiple-lines.java:2:     S extends B<T>, T extends C<T>,
multiple-lines.java:3:     U extends D, W extends E,
multiple-lines.java:4:     X extends F, Y extends G, Z extends H>
multiple-lines.java:5:     extends OtherClazz<S> implements I<T> {
single-line.java:1: class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {
Run Code Online (Sandbox Code Playgroud)

以上是使用此输入运行的:

$ head tmp/*
==> tmp/X-no-parameter.java <==
class ClazzC /* no type parameter */ extends OtherClazz<S> implements I<T> {

  public void method(Type<A, B> x) {
    // ... code ...
  }
}

==> tmp/X-one-parameter.java <==
class ClazzD<R extends A>  // only one type parameter
    extends OtherClazz<S> implements I<T> {

  public void method(Type<X, Y> x) {
    // ... code ...
  }
}

==> tmp/X-two-line-parameters.java <==
class ClazzF<R extends A,  // only two type parameters
    S extends B<T>>        // on two lines
    extends OtherClazz<S> implements I<T> {

  public void method(Type<X, Y> x) {
    // ... code ...
  }
}

==> tmp/X-two-parameters.java <==
class ClazzE<R extends A, S extends B<T>>  // only two type parameters
    extends OtherClazz<S> implements I<T> {

  public void method(Type<X, Y> x) {
    // ... code ...
  }
}

==> tmp/multiple-lines.java <==
class ClazzA<R extends A,
    S extends B<T>, T extends C<T>,
    U extends D, W extends E,
    X extends F, Y extends G, Z extends H>
    extends OtherClazz<S> implements I<T> {

  public void method(Type<Q, R> x) {
    // ... code ...
  }
}

==> tmp/single-line.java <==
class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {

  public void method(Type<Q, R> x) {
    // ... code ...
  }
}
Run Code Online (Sandbox Code Playgroud)

以上只是最好的努力,没有为语言编写解析器,只是让 OP 发布示例输入/输出以继续处理需要处理的内容。