将数据加载到BigQuery时,csv文件中有换行符时,抱怨“缺少双引号(“)字符”

Jud*_*ing 3 google-bigquery

罪魁祸首如下。它应该由14列组成,其中一列以“嗨,我是尼日尔...”开头,并用换行符覆盖多行。

17935,9a7105ee-30c8-4a6d-9374-10875b7d6288.jpg,"""top""=>""0"", ""left""=>""0"", ""width""=>""180"", ""height""=>""180""",,"",2015-07-26 19:33:57.292058,2015-07-26 20:25:30.068887,fe43876f-1b2c-464a-aa20-bf335ed3ff62,c68c8c70-bc2b-11e4-90a1-22000b21105f,{},2e790350-15fb-0133-2cb8-22000ba51078,"Hi I'm Nigerian so wish to study in sweden.
so I'm Undergraduate student I want study Engineering. 
Thanks.","",{}
Run Code Online (Sandbox Code Playgroud)

通过命令将此CSV数据加载到BigQuery中时bq load --replace --source_format=CSV -F"," ...,会报错。谁能给我这个BigQuery Load Data命令的解决方案?

- File: 0 / Line:17192 / Field:12: Missing close double quote (")
character: field starts with: <Hi I'm N>
- File: 0 / Line:17193: Too few columns: expected 14 column(s) but
got 1 column(s). For additional help: http://goo.gl/RWuPQ
- File: 0 / Line:17194: Too few columns: expected 14 column(s) but
got 3 column(s). For additional help: http://goo.gl/RWuPQ
Run Code Online (Sandbox Code Playgroud)

Abd*_*han 6

如果您尝试从 BigQuery google 控制台将 CSV 文件加载到表中,请确保选择Advanced option -> Quoted new lines.

在此输入图像描述


Mic*_*don 5

如果您要使用嵌入式换行符加载CSV,则需要指定allowQuotedNewlines

https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.allowQuotedNewlines

BigQuery的默认设置是假设CSV数据不包含换行符。由于可以在任意换行符处拆分输入文件,因此在处理大型数据文件时可以提供更高的解析吞吐量。如果数据在字符串中包含换行符,则每个文件都需要由一台机器线性解析。


小智 5

确保在将数据加载到 BigQuery 之前包含此行:'job_config.allow_quoted_newlines = True'

job_config = bigquery.LoadJobConfig()
job_config.allow_quoted_newlines = True
Run Code Online (Sandbox Code Playgroud)