我有一个包含长数字的列的数据框。我正在尝试将数字列中的所有值转换为逗号分隔的数千个值。
df
col_1 col_2
Rooney 34590927
Ronaldo 5467382
John 25647398
Run Code Online (Sandbox Code Playgroud)
如何迭代并获得以下结果?
预期结果:
col_1 col_2
Rooney 34,590,927
Ronaldo 5,467,382
John 25,647,398
Run Code Online (Sandbox Code Playgroud) 我有一个.csv文件,其中包含少数数据中心的IP地址列表.该列表目前看起来类似于下表:
Data_Center_Name IP
DC_1 52.102.182.2
DC_1 52.102.182.4
DC_1 52.102.182.1
DC_1 52.102.182.5
DC_1 52.102.182.3
DC_1 27.101.178.17
DC_1 27.101.178.16
DC_1 27.101.178.15
DC_1 23.201.165.7
DC_2 55.200.162.10
DC_2 55.200.162.12
DC_2 55.200.162.13
DC_2 55.200.162.11
DC_3 30.101.102.4
Run Code Online (Sandbox Code Playgroud)
我想将列表转换为单个列表,例如:
DC_1 = [52.102.182.1-52.102.182.5,
27.101.178.15-27.101.178.17,
23.201.165.7]
DC_2 = [55.200.162.10-55.200.162.13]
DC_3 = [30.101.102.4]
Run Code Online (Sandbox Code Playgroud)
任何人都可以帮我使用python吗?
REGEXP_EXTRACT(",\"AQk8tmAg94ZUZwqYKd6kHrswiVZR0wKNuTvSpr6COCLpki\"", r"(?<=,\")[a-zA-Z0-9]*")
Run Code Online (Sandbox Code Playgroud)
#1 我试图从中提取的字符串
,"AQk8tmAg94ZUZwqYKd6kHrswiVZR0wKNuTvSpr6COCLpki"
Run Code Online (Sandbox Code Playgroud)
#2 REGEX_EXTRACT 的预期结果
AQk8tmAg94ZUZwqYKd6kHrswiVZR0wKNuTvSpr6COCLpki
Run Code Online (Sandbox Code Playgroud)
有人可以帮助我在正则表达式中正确转义引号和括号以从#1 中提取#2 吗?
我有一个Dataflow作业要写入BigQuery.它适用于非嵌套模式,但嵌套模式失败.
这是我的Dataflow管道:
pipeline_options = PipelineOptions()
p = beam.Pipeline(options=pipeline_options)
wordcount_options = pipeline_options.view_as(WordcountTemplatedOptions)
schema = 'url: STRING,' \
'ua: STRING,' \
'method: STRING,' \
'man: RECORD,' \
'man.ip: RECORD,' \
'man.ip.cc: STRING,' \
'man.ip.city: STRING,' \
'man.ip.as: INTEGER,' \
'man.ip.country: STRING,' \
'man.res: RECORD,' \
'man.res.ip_dom: STRING'
first = p | 'read' >> ReadFromText(wordcount_options.input)
second = (first
| 'process' >> (beam.ParDo(processFunction()))
| 'write' >> beam.io.WriteToBigQuery(
'myBucket:tableFolder.test_table',
schema=schema)
)
Run Code Online (Sandbox Code Playgroud)
我使用以下Schema创建了BigQuery Table:
[
{
"mode": "NULLABLE",
"name": "url",
"type": "STRING"
},
{
"mode": "NULLABLE", …Run Code Online (Sandbox Code Playgroud) python google-bigquery google-cloud-platform google-cloud-dataflow apache-beam
这是我的输入文件的样子:
{"Id": 1, "Address": {"Street":"MG Road","City":"Pune"}}
{"Id": 2, "Address": {"City":"Mumbai"}}
{"Id": 3, "Address": {"Street":"XYZ Road"}}
{"Id": 4}
{"Id": 5, "PhoneNumber": 12345678, "Address": {"Street":"ABCD Road", "City":"Bangalore"}}
Run Code Online (Sandbox Code Playgroud)
在我的数据流管道中,我如何动态确定每行中存在哪些字段以符合 BigQuery 表架构。例如,在第 2 行中,Street丢失了。我希望Address.StreetBigQuery 中的列条目为"N/A"ornull并且不希望管道因架构更改或丢失数据而失败。
在使用 Python 写入 BigQuery 之前,如何在数据流作业中处理此逻辑?
python google-bigquery google-cloud-platform google-cloud-dataflow google-cloud-functions
我有一个名为“类别”的表。我试图将每个类别的百分比作为最终表。
Category TOTAL
Category_x 5
Category_y 10
Category_z 20
Category_a 30
Category_b 40
Run Code Online (Sandbox Code Playgroud)
预期表
Category TOTAL Overall_Percentage
Category_x 5 4.76
Category_y 10 9.523
Category_z 20 19.047
Category_a 30 28.57
Category_b 40 38.09
Run Code Online (Sandbox Code Playgroud)
我的代码:
SELECT Category, TOTAL, 100*(TOTAL/SUM(TOTAL)) AS Overall_Percentage
FROM Categories
GROUP BY 1,2
Run Code Online (Sandbox Code Playgroud)