小编Bal*_*nki的帖子

AWS Glue:爬网程序将时间戳误解为字符串。GLUE ETL 旨在将字符串转换为时间戳使它们为 NULL

我一直在按照这里的教程使用AWS Glue进行一些快速分析

虽然我已经能够成功创建爬虫并在 Athena 中发现数据,但我遇到了爬虫创建的数据类型的问题。该datetimestamp数据类型得到读的string数据类型。

我随后通过ETL使用爬虫创建的数据源作为输入和Amazon S3 中的目标表在 GLUE 中创建作业

作为映射转换的一部分,我将日期和时间戳的数据类型转换为stringtimestamp但不幸的是 ETL 将这些列类型转换为NULLS. 我曾考虑将分类器与GROK表达式一起使用,但后来决定将它们转换为 GLUE 中 ETL 的一部分。

时间戳格式为 1/08/2010 6:15:00 PM

amazon-s3 amazon-web-services amazon-athena aws-glue

9
推荐指数
2
解决办法
1万
查看次数

使用jq将所有json键转换为小写

我正在寻找一个带有数组的JSON文件到我的数据库中.带有数组项的json文件如下: -

   {
  "campaignId": "11067182",
  "campaignName": "11067182",
  "channelId": "%pxbid_universal_site_id=!;",
  "channelName": "%pxbid_universal_site_id=!;",
  "placementId": "%epid!",
  "placementName": "%epid!",
  "publisherId": "%esid!",
  "publisherName": "%esid!",
  "hitDate": "2017-03-23",
  "lowRiskImpressions": "61485",
  "lowRiskPct": "64.5295",
  "moderateRiskImpressions": "1887",
  "moderateRiskPct": "1.9804",
  "highRiskImpressions": "43",
  "highRiskPct": "0.0451",
  "veryHighRiskImpressions": "860",
  "veryHighRiskPct": "0.9026",
  "totalRated": "95274",
  "unrated": "8",
  "unratedPct": "0.0084",
  "visibleCount": "64283",
  "pctVisible": "67.4660",
  "invisibleCount": "30999",
  "totalImpressions": "95282"
}
{
  "campaignId": "11067182",
  "campaignName": "11067182",
  "channelId": "%pxbid_universal_site_id=!;",
  "channelName": "%pxbid_universal_site_id=!;",
  "placementId": "%epid!",
  "placementName": "%epid!",
  "publisherId": "%esid!",
  "publisherName": "%esid!",
  "hitDate": "2017-03-22",
  "lowRiskImpressions": "17929",
  "lowRiskPct": "52.9379",
  "moderateRiskImpressions": "1872",
  "moderateRiskPct": "5.5273",
  "highRiskImpressions": …
Run Code Online (Sandbox Code Playgroud)

json key lowercase data-conversion jq

5
推荐指数
1
解决办法
4014
查看次数

jq 用字符串作为 json 对象中的字符串前缀

我希望为 Redshift 生成一个清单文件,其中COPY包含aws s3api --list-objectsjq,如下所示:-

aws s3api list-objects --bucket annalects3 --prefix "DFA/20160926/394007-OMD-Coles/dcm_account394007_impression" --output json --query '{"entries": Contents[].{"url":"Key"}}' | jq '.entries[].mandatory = true'
Run Code Online (Sandbox Code Playgroud)

它生成如下输出:-

    {   "entries": [
        {
          "mandatory": true,
          "url": "DFA/20160926/394007-OMD-Coles/dcm_account394007_impression_2016092507_20160926_002328_292527438.csv.gz"
        },
        {
          "mandatory": true,
          "url": "DFA/20160926/394007-OMD-Coles/dcm_account394007_impression_2016092508_20160926_020131_292592736.csv.gz"
        },
        {
          "mandatory": true,
          "url": "DFA/20160926/394007-OMD-Coles/dcm_account394007_impression_2016092509_20160926_030312_292502379.csv.gz"
        },
        {
          "mandatory": true,
          "url": "DFA/20160926/394007-OMD-Coles/dcm_account394007_impression_2016092510_20160926_033656_292590227.csv.gz"
        }   
  ] 
}
Run Code Online (Sandbox Code Playgroud)

但是,清单文件需要以存储桶名称为前缀的 URL 对象,但我没有使用过。输出需要看起来像

{   "entries": [
        {
          "mandatory": true,
          "url": "s3://mybucket/DFA/20160926/394007-OMD-Coles/dcm_account394007_impression_2016092507_20160926_002328_292527438.csv.gz"
        },
        {
          "mandatory": true,
          "url": "s3://mybucket/DFA/20160926/394007-OMD-Coles/dcm_account394007_impression_2016092508_20160926_020131_292592736.csv.gz"
        },
        {
          "mandatory": true,
          "url": …
Run Code Online (Sandbox Code Playgroud)

json amazon-s3 amazon-redshift jq

2
推荐指数
1
解决办法
2878
查看次数

jq 构造,具有跨越多行的值字符串

我正在尝试使用jq它形成一个 JSON 构造,理想情况下应如下所示:-

{
  "api_key": "XXXXXXXXXX-7AC9-D655F83B4825",
  "app_guid": "XXXXXXXXXXXXXX",
  "time_start": 1508677200,
  "time_end": 1508763600,
  "traffic": [
    "event"
  ],
  "traffic_including": [
    "unattributed_traffic"
  ],
  "time_zone": "Australia/NSW",
  "delivery_format": "csv",
  "columns_order": [
    "attribution_attribution_action",
    "attribution_campaign",
    "attribution_campaign_id",
    "attribution_creative",
    "attribution_date_adjusted",
    "attribution_date_utc",
    "attribution_matched_by",
    "attribution_matched_to",
    "attribution_network",
    "attribution_network_id",
    "attribution_seconds_since",
    "attribution_site_id",
    "attribution_site_id",
    "attribution_tier",
    "attribution_timestamp",
    "attribution_timestamp_adjusted",
    "attribution_tracker",
    "attribution_tracker_id",
    "attribution_tracker_name",
    "count",
    "custom_dimensions",
    "device_id_adid",
    "device_id_android_id",
    "device_id_custom",
    "device_id_idfa",
    "device_id_idfv",
    "device_id_kochava",
    "device_os",
    "device_type",
    "device_version",
    "dimension_count",
    "dimension_data",
    "dimension_sum",
    "event_name",
    "event_time_registered",
    "geo_city",
    "geo_country",
    "geo_lat",
    "geo_lon",
    "geo_region",
    "identity_link",
    "install_date_adjusted",
    "install_date_utc",
    "install_device_version",
    "install_devices_adid",
    "install_devices_android_id",
    "install_devices_custom",
    "install_devices_email_0",
    "install_devices_email_1",
    "install_devices_idfa",
    "install_devices_ids",
    "install_devices_ip", …
Run Code Online (Sandbox Code Playgroud)

json string-literals jq

1
推荐指数
1
解决办法
9524
查看次数