小编Leo*_*ssi的帖子

如何使用Google Big Query在GROUP_CONCAT上获取不同的值

我在BigQuery中使用GROUP_CONCAT时试图获得不同的值.

我将使用更简单的静态示例重新创建情境:

编辑:我已经修改了示例以更好地表示我的真实情况:带有group_concat的2列需要是不同的:

SELECT 
  category, 
  GROUP_CONCAT(id) as ids, 
  GROUP_CONCAT(product) as products
FROM 
 (SELECT "a" as category, "1" as id, "car" as product),
 (SELECT "a" as category, "2" as id, "car" as product),
 (SELECT "a" as category, "3" as id, "car" as product),
 (SELECT "b" as category, "4" as id, "car" as product),
 (SELECT "b" as category, "5" as id, "car" as product),
 (SELECT "b" as category, "2" as id, "bike" as product),
 (SELECT "a" as category, "1" as id, …
Run Code Online (Sandbox Code Playgroud)

distinct group-concat google-bigquery

8
推荐指数
1
解决办法
5915
查看次数

如何将 Python Dict 映射到 Big Query Schema

我有一个带有一些嵌套值的字典,如下所示:

my_dict = {
    "id": 1,
    "name": "test",
    "system": "x",
    "date": "2015-07-27",
    "profile": {
        "location": "My City",
        "preferences": [
            {
                "code": "5",
                "description": "MyPreference",
            }
        ]
    },
    "logins": [
        "2015-07-27 07:01:03",
        "2015-07-27 08:27:41"
    ]
}
Run Code Online (Sandbox Code Playgroud)

并且,我有一个 Big Query Table Schema,如下所示:

schema = {
    "fields": [
        {'name':'id', 'type':'INTEGER', 'mode':'REQUIRED'},
        {'name':'name', 'type':'STRING', 'mode':'REQUIRED'},
        {'name':'date', 'type':'TIMESTAMP', 'mode':'REQUIRED'},
        {'name':'profile', 'type':'RECORD', 'fields':[
            {'name':'location', 'type':'STRING', 'mode':'NULLABLE'},
            {'name':'preferences', 'type':'RECORD', 'mode':'REPEATED', 'fields':[
                {'name':'code', 'type':'STRING', 'mode':'NULLABLE'},
                {'name':'description', 'type':'STRING', 'mode':'NULLABLE'}
            ]},
        ]},
        {'name':'logins', 'type':'TIMESTAMP', 'mode':'REPEATED'}
    ]
}
Run Code Online (Sandbox Code Playgroud)

我想遍历所有原始的 my_dict 并根据架构的结构构建一个新的 …

python dictionary google-bigquery

2
推荐指数
1
解决办法
4238
查看次数

如何规范化R中数据表中的多值列

我有一个data.table如下:

order   products    value
1000    A|B 10
2000    B|C 20
3000    A|C 30
4000    B|C|D   5
5000    C|D 15
Run Code Online (Sandbox Code Playgroud)

我需要打破列产品并转换/规范化以便像这样使用:

order   prod.seq    prod.name   value
1000    1   A   10
1000    2   B   10
2000    1   B   20
2000    2   C   20
3000    1   A   30
3000    2   C   30
4000    1   B   5
4000    2   C   5
4000    3   D   5
5000    1   C   15
5000    2   D   15
Run Code Online (Sandbox Code Playgroud)

我想我可以使用自定义FOR/LOOP来实现它,但我想知道使用apply,ddply方法更高级的方法.有什么建议?

r apply plyr data.table

1
推荐指数
1
解决办法
671
查看次数