Amazon athena 无法读取 S3 Access 日志文件并且 Athena select 查询为每列返回空结果集

use*_*574 3 amazon-athena

我在 Athena 中成功创建了数据库表。请参阅下面的查询。

   CREATE EXTERNAL TABLE IF NOT EXISTS s3_access_logs_db.wafbucket_logs(
      BucketOwner STRING,
      Bucket STRING,
      RequestDateTime STRING,
      RemoteIP STRING,
      Requester STRING,
      RequestID STRING,
      Operation STRING,
      Key STRING,
      RequestURI_operation STRING,
      RequestURI_key STRING,
      RequestURI_httpProtoversion STRING,
      HTTPstatus STRING,
      ErrorCode STRING,
      BytesSent BIGINT,
      ObjectSize BIGINT,
      TotalTime STRING,
      TurnAroundTime STRING,
      Referrer STRING,
      UserAgent STRING,
      VersionId STRING,
      HostId STRING,
      SigV STRING,
      CipherSuite STRING,
      AuthType STRING,
      EndPoint STRING,
      TLSVersion STRING
  ) 
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
  WITH SERDEPROPERTIES (
               'serialization.format' = '1', 'input.regex' = '([^ ]*) ([^ ]*) 
               \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)
               \\\" (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\") ([^ ]*)
               (?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$' )
      LOCATION 's3://stb-aws-bucket-logging/logs/';
Run Code Online (Sandbox Code Playgroud)

但是,当我对表运行查询时,它返回一个空结果集。有20行但都是空的!

SELECT * FROM s3_access_logs_db.wafbucket_logs limit 20;
Run Code Online (Sandbox Code Playgroud)

以前有人遇到过这个问题吗?

谢谢,团

use*_*574 5

我发现我的解析格式有错误。它有换行!我从 AWS 文档复制了访问日志的解析器格式。我想我无意中在解析器格式中添加了换行符!这是正确的解析器格式:

'serialization.format' = '1', 'input.regex' = '([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\") ([^ ]*) (?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$'
Run Code Online (Sandbox Code Playgroud)

有效!!!

您的评论迫使我重新检查解析器格式。

谢谢,团