我有一个GCS存储桶,我正试图从其中读取约200k个文件,然后将它们写入BigQuery。问题是我在创建与代码配合良好的PCollection时遇到了麻烦。我正在按照本教程进行参考。
我有以下代码:
from __future__ import absolute_import
import argparse
import logging
import os
from past.builtins import unicode
import apache_beam as beam
from apache_beam.io import ReadFromText, ReadAllFromText
from apache_beam.io import WriteToText
from apache_beam.metrics import Metrics
from apache_beam.metrics.metric import MetricsFilter
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions
from google.cloud import storage
import regex as re
# storage_client = storage.Client()
# bucket = storage_client.get_bucket('mybucket')
#
# blobs = bucket.list_blobs()
# l=list(blobs)
# x=[y.name for y in l]
# c=x[1:]
# print(len(c)) …Run Code Online (Sandbox Code Playgroud)