小编Gal*_*lka的帖子

boost.python c++ multithreading

I'm writing a python program that includes a c++ module (.so, using boost.python).
I'm starting several python threads that run a c++ function.

This is how the C++ code looks like:

#include <boost/python.hpp>
using namespace boost;
void f(){
    // long calculation

    // call python function

    // long calculation
}

BOOST_PYTHON_MODULE(test)
{
    python::def("f", &f);
}

Run Code Online (Sandbox Code Playgroud)

And the python code:

from test import f
t1 = threading.Thread(target=f)
t1.setDaemon(True)
t1.start()
print "Still running!"

Run Code Online (Sandbox Code Playgroud)

I encounter a problem: the "Still running!" …

c++ python multithreading gil boost-python

Gal*_*lka

2017 02-27

5
推荐指数

1
解决办法

2473
查看次数

本地模式下的 Apache Spark 结构化流字计数示例超慢

我正在尝试在本地模式下为结构化流运行 Apache Spark 字数统计示例，但我得到了 10-30 秒的非常高的延迟。这是我正在使用的代码（取自https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html）：

host = sys.argv[1]
port = int(sys.argv[2])

spark = SparkSession \
    .builder \
    .appName("StructuredNetworkWordCount") \
    .getOrCreate()

spark.sparkContext.setLogLevel("ERROR")

lines = spark \
    .readStream \
    .format("socket") \
    .option("host", host) \
    .option("port", port) \
    .load()

words = lines.select(
   explode(
       split(lines.value, " ")
   ).alias("word")
)

# Generate running word count
wordCounts = words.groupBy("word").count()

query = wordCounts \
    .writeStream \
    .outputMode("update") \
    .format("console") \
    .start()

query.awaitTermination()

Run Code Online (Sandbox Code Playgroud)

在编程指南中提到延迟应该在 100 毫秒左右，这似乎不是一个复杂的例子。另一件事要提到的是，当我在没有任何处理的情况下运行它时（只是将数据流式传输到输出），我会立即看到结果。

该示例在 Ubuntu 18.04、Apache Spark 2.4.4 上运行。

这是正常的，还是我在这里做错了什么？

谢谢！加尔

latency word-count apache-spark pyspark

Gal*_*lka

lucky-day

5
推荐指数

0
解决办法

333
查看次数