小编fox*_*ndy的帖子

VS2017中的警告但VS2015都没问题

在Visual Studio 2017中加载我的.NET Frameowrk 4.6.2解决方案时,它给出了以下警告:

Severity Code Description Project File Line Suppression State
Warning Your project is not referencing the ".NETFramework,Version=v4.6.2" framework. Add a reference to ".NETFramework,Version=v4.6.2" in the "frameworks" section of your project.json, and then re-run NuGet restore.

Run Code Online (Sandbox Code Playgroud)

另一个:

Warning IDE0006 Error encountered while loading the project. Some project features, such as full solution analysis for the failed project and projects that depend on it, have been disabled.   BigData     1   Active

Run Code Online (Sandbox Code Playgroud)

但是,使用完全相同的解决方案文件和结构,在visual studio 2015中加载完全没问题.

为什么这样以及我如何解决它？

顺便说一句,从我读到的,在最新的更新中,project.json被合并回.csproj,为什么在这里仍然推荐一些关于project.json的东西

visual-studio-2017

fox*_*ndy

lucky-day

21
推荐指数

1
解决办法

2722
查看次数

Bigquery控制台不显示所有表

我们现在在一个数据集中有1144个表,但其中许多表未列在Bigquery控制台的左侧列表中.我想知道这是否是由于设定的限制.

google-bigquery

fox*_*ndy

lucky-day

9
推荐指数

1
解决办法

1914
查看次数

BigQuery中的任何功能都可以在不执行复制数据的情况下迁移另一个项目中的整个数据集？

在我们的项目不断发展的同时,我们意识到我们需要创建新项目并重新组织我们的数据集.一个案例是我们需要将一个数据集与其他数据集隔离到另一个新项目中.我知道我可以通过API逐个复制表然后删除旧表来实现.但是当涉及到超过一千个表时,由于复制api作为一项工作执行而且需要时间,因此它确实消耗了大量时间.是否可以只更改数据集的引用(或路径)？

跟进我尝试使用批量请求复制表.我在所有请求中都获得了200 OK,但表格并没有被复制.我想知道为什么以及如何获得真实的结果.这是我的代码:

    public async Task CopyTableToProjectInBatchAsync(IList<TableList.TablesData> fromTables, string toProjectId)
    {
        var request = new BatchRequest(BigQueryService);
        foreach (var tableData in fromTables)
        {
            string fromDataset = tableData.TableReference.DatasetId;
            string fromTableId = tableData.TableReference.TableId;
            Logger.Info("copying table {0}...",tableData.Id);
            request.Queue<JobReference>(CreateTableCopyRequest(fromDataset, fromTableId, toProjectId),
            (content, error, i, message) =>
            {
                Logger.Info("#content:\n" + content);
                Logger.Info("#error:\n" + error);
                Logger.Info("#i:\n" + i);
                Logger.Info("#message:\n" + message);
            });
        }
        await request.ExecuteAsync();
    }

   private IClientServiceRequest CreateTableCopyRequest(string fromDatasetId, string fromTableId, string toProjectId,
        string toDatasetId=null, string toTableId=null)
    {
        if (toDatasetId == null)
            toDatasetId = fromDatasetId; …

Run Code Online (Sandbox Code Playgroud)

google-bigquery

fox*_*ndy

2015 09-23

9
推荐指数

4
解决办法

5073
查看次数

Bigquery流:'由于超时而无法插入XX行'

最近几天,我们的流媒体见面了

"Failed to insert XX rows. First error: {"errors":[{"reason":"timeout"}],"index":YY}"

Run Code Online (Sandbox Code Playgroud)

在过去半个月中,从未更改的数据源和程序脚本连续流式传输,之前未发现此类故障.

项目编号:red-road-574

google-bigquery

fox*_*ndy

lucky-day

8
推荐指数

2
解决办法

753
查看次数

'_UnwindowedValues'类型的对象没有len()意味着什么？

我正在使用Dataflow 0.5.5 Python.在非常简单的代码中遇到以下错误:

print(len(row_list))

Run Code Online (Sandbox Code Playgroud)

row_list是一个清单.完全相同的代码,相同的数据和相同的管道在DirectRunner上运行完全正常,但在DataflowRunner上抛出以下异常.它是什么意思以及我如何解决它？

job name: `beamapp-root-0216042234-124125`

    (f14756f20f567f62): Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 544, in do_work
    work_executor.execute()
  File "dataflow_worker/executor.py", line 973, in dataflow_worker.executor.MapTaskExecutor.execute (dataflow_worker/executor.c:30547)
    with op.scoped_metrics_container:
  File "dataflow_worker/executor.py", line 974, in dataflow_worker.executor.MapTaskExecutor.execute (dataflow_worker/executor.c:30495)
    op.start()
  File "dataflow_worker/executor.py", line 302, in dataflow_worker.executor.GroupedShuffleReadOperation.start (dataflow_worker/executor.c:12149)
    def start(self):
  File "dataflow_worker/executor.py", line 303, in dataflow_worker.executor.GroupedShuffleReadOperation.start (dataflow_worker/executor.c:12053)
    with self.scoped_start_state:
  File "dataflow_worker/executor.py", line 316, in dataflow_worker.executor.GroupedShuffleReadOperation.start (dataflow_worker/executor.c:11968)
    with self.shuffle_source.reader() as reader:
  File "dataflow_worker/executor.py", line 320, in dataflow_worker.executor.GroupedShuffleReadOperation.start (dataflow_worker/executor.c:11912)
    self.output(windowed_value)
  File "dataflow_worker/executor.py", line 152, …

Run Code Online (Sandbox Code Playgroud)

google-cloud-dataflow apache-beam

fox*_*ndy

2019 01-10

8
推荐指数

1
解决办法

1657
查看次数

如何在脚本中获取 BigQuery 数据集的数据位置信息

我们知道，bq mk在BigQuery中使用命令创建数据集时，我们可以使用flag--data_location来指定我们希望该数据集下的表数据位于哪个区域。

我们现在想要设置一个监视器，以便每当有人在我们指定位置之外创建数据集时，我们就可以向数据集所有者发出警报。为此，我们需要一个可以自动扫描所有数据集并获取位置信息的脚本。我们查看了 api 调用和 bq 命令行工具命令，没有关于显示/查询数据集的数据位置的线索。想知道是否有办法实现我们的目标？

google-bigquery

fox*_*ndy

lucky-day

7
推荐指数

1
解决办法

4210
查看次数

Cloud Dataflow 中 Http 调用的最佳实践 - Java

当在 Google Cloud Dataflow 中运行的管道中从 DoFn 进行 http 调用时，最佳实践是什么？（爪哇）

我的意思是，如果在不使用 Beam 的纯 Java 中，我需要考虑诸如异步调用或至少是多线程之类的事情。考虑管理线程池、连接池...使用 Dataflow，如果我只有一个线程在每个 ProcessElement 中进行同步调用，会发生什么情况？在 DoFn 中进行 http 调用的最佳实践是什么？

google-cloud-dataflow apache-beam

fox*_*ndy

lucky-day

6
推荐指数

0
解决办法

3320
查看次数

BigQuery - UNION ALL 的类型不兼容？

以下是我的查询的简化版本。Debug 表中的 DebugReason 类型为INTEGER，DebugData 类型为STRING。GPS 表中没有这样的两个字段，所以我用 all 伪造了它NULL。我需要这样做的原因与这个问题无关，长话短说，我在接下来的一些过程中需要它

 WITH RawDebug as 
(
  SELECT 
  STRUCT(DebugReason,DebugData) as Debug
  FROM `devicedata.Debug.T*`
  WHERE _TABLE_SUFFIX="20180624"

),
RawGPS AS (        
       SELECT
          STRUCT(null as DebugReason,null as DebugData) as Debug
        FROM
          `devicedata.Gps.T*` AS g
         WHERE _TABLE_SUFFIX="20180624"

)
select Debug
from RawDebug
UNION ALL
select Debug
from RawGPS

Run Code Online (Sandbox Code Playgroud)

BigQuery 说：

Error: Column 1 in UNION ALL has incompatible types: STRUCT<DebugReason INT64, DebugData STRING>, STRUCT<DebugReason INT64, DebugData INT64> at [18:1]

Run Code Online (Sandbox Code Playgroud)

我不知道出了什么问题......以及如何纠正？

google-bigquery

fox*_*ndy

2023 05-03

5
推荐指数

2
解决办法

2万
查看次数

Bigquery：无效：非法架构更新

我试图将查询中的数据追加到bigquery表中。

职位ID job_i9DOuqwZw4ZR2d509kOMaEUVm1Y

错误：写入Bigquery时作业失败。无效：非法架构更新。无法将字段（字段：debug_data）添加为空

我复制并粘贴在上面的jon中执行的查询，在Web控制台中运行它，然后选择要附加的同一目标表，它可以工作。

google-bigquery

fox*_*ndy

lucky-day

4
推荐指数

1
解决办法

3520
查看次数

google-cloud-dataflow vs apache-beam

每个用于数据流的Google文档都说它现在基于Apache Beam，将我定向到Beam网站真是令人困惑。另外，如果我寻找github项目，我会看到google dataflow项目为空，而所有内容都将归入Apache Beam Repo。现在说我需要创建一个管道，根据从Apache Beam读取的内容，我会做的：from apache_beam.options.pipeline_options但是，如果我使用google-cloud-dataflow，则会出现错误：no module named 'options'，事实证明我应该使用from apache_beam.utils.pipeline_options。因此，看起来google-cloud-dataflow具有较旧的Beam版本，将被弃用吗？

我应该选择哪一个来开发我的数据流管道？

google-cloud-dataflow apache-beam

fox*_*ndy

lucky-day

2
推荐指数

1
解决办法

2610
查看次数

如何使用 Apache Beam (Java) 进行异步 Http 调用？

输入PCollection是http请求，它是一个有界数据集。我想在 ParDo 中进行异步 http 调用（Java），解析响应并将结果放入输出 PCollection 中。我的代码如下。获取异常如下。

我不明白原因。需要指导....

java.util.concurrent.CompletionException: java.lang.IllegalStateException: Can't add element ValueInGlobalWindow{value=streaming.mapserver.backfill.EnrichedPoint@2c59e, pane=PaneInfo.NO_FIRING} to committed bundle in PCollection Call Map Server With Rate Throttle/ParMultiDo(ProcessRequests).output [PCollection]

Run Code Online (Sandbox Code Playgroud)

代码：

public class ProcessRequestsFn extends DoFn<PreparedRequest,EnrichedPoint> {
    private static AsyncHttpClient _HttpClientAsync;
    private static ExecutorService _ExecutorService;

static{

    AsyncHttpClientConfig cg = config()
            .setKeepAlive(true)
            .setDisableHttpsEndpointIdentificationAlgorithm(true)
            .setUseInsecureTrustManager(true)
            .addRequestFilter(new RateLimitedThrottleRequestFilter(100,1000))
            .build();

    _HttpClientAsync = asyncHttpClient(cg);

    _ExecutorService = Executors.newCachedThreadPool();

}


@DoFn.ProcessElement
public void processElement(ProcessContext c) {

    PreparedRequest request = c.element();

    if(request == null)
        return;

    _HttpClientAsync.prepareGet((request.getRequest()))
            .execute()
            .toCompletableFuture()
            .thenApply(response -> …

Run Code Online (Sandbox Code Playgroud)

asynchttpclient apache-beam

fox*_*ndy

lucky-day

2
推荐指数

1
解决办法

4366
查看次数

带有Java的Google云存储gsutil工具

如果我们有大约30G文件(范围从50MB到4GB)需要每天上传到谷歌云存储,根据谷歌文档,gsutil可能是唯一合适的选择,不是吗？

我想用Java调用gsutil命令,现在下面的代码可以工作.但是如果我删除while循环,程序将在runtime.exec(命令)之后立即停止,但python进程已启动但不进行上传,很快就会被终止.我想知道为什么.

我从sterr流中读取的原因是受到管道gsutil输出到文件的启发

我决定gsutil是否通过read util执行其状态输出的最后一行,但它是否可靠？有没有更好的方法来检测gsutil执行是否以Java结尾？

String command="python c:/gsutil/gsutil.py cp C:/SFC_Data/gps.txt"
            + " gs://getest/gps.txt";
 try {
        Process process = Runtime.getRuntime().exec(command);
        System.out.println("the output stream is "+process.getErrorStream());
        BufferedReader reader=new BufferedReader(new InputStreamReader(process.getErrorStream())); 
        String s; 
        while ((s = reader.readLine()) != null){
            System.out.println("The inout stream is " + s);
        }                
    } catch (IOException e) {
        e.printStackTrace();
    }

Run Code Online (Sandbox Code Playgroud)

google-cloud-storage

fox*_*ndy

2017 05-23

1
推荐指数

1
解决办法

2222
查看次数