小编Pra*_*nda的帖子

读取CSV并从Apache Beam写入BigQuery

我有一个GCS存储桶,我正试图从其中读取约200k个文件,然后将它们写入BigQuery。问题是我在创建与代码配合良好的PCollection时遇到了麻烦。我正在按照教程进行参考。

我有以下代码:

from __future__ import absolute_import

import argparse
import logging
import os

from past.builtins import unicode

import apache_beam as beam
from apache_beam.io import ReadFromText, ReadAllFromText
from apache_beam.io import WriteToText
from apache_beam.metrics import Metrics
from apache_beam.metrics.metric import MetricsFilter
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions
from google.cloud import storage

import regex as re

# storage_client = storage.Client()
# bucket = storage_client.get_bucket('mybucket')
#
# blobs = bucket.list_blobs()
# l=list(blobs)
# x=[y.name for y in l]
# c=x[1:]
# print(len(c)) …
Run Code Online (Sandbox Code Playgroud)

python google-bigquery google-cloud-dataflow apache-beam

1
推荐指数
1
解决办法
554
查看次数