I'm writing a python program that includes a c++ module (.so, using boost.python).
I'm starting several python threads that run a c++ function.
This is how the C++ code looks like:
#include <boost/python.hpp>
using namespace boost;
void f(){
// long calculation
// call python function
// long calculation
}
BOOST_PYTHON_MODULE(test)
{
python::def("f", &f);
}
Run Code Online (Sandbox Code Playgroud)
And the python code:
from test import f
t1 = threading.Thread(target=f)
t1.setDaemon(True)
t1.start()
print "Still running!"
Run Code Online (Sandbox Code Playgroud)
I encounter a problem: the "Still running!" …
我正在尝试在本地模式下为结构化流运行 Apache Spark 字数统计示例,但我得到了 10-30 秒的非常高的延迟。这是我正在使用的代码(取自https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html):
host = sys.argv[1]
port = int(sys.argv[2])
spark = SparkSession \
.builder \
.appName("StructuredNetworkWordCount") \
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
lines = spark \
.readStream \
.format("socket") \
.option("host", host) \
.option("port", port) \
.load()
words = lines.select(
explode(
split(lines.value, " ")
).alias("word")
)
# Generate running word count
wordCounts = words.groupBy("word").count()
query = wordCounts \
.writeStream \
.outputMode("update") \
.format("console") \
.start()
query.awaitTermination()
Run Code Online (Sandbox Code Playgroud)
在编程指南中提到延迟应该在 100 毫秒左右,这似乎不是一个复杂的例子。另一件事要提到的是,当我在没有任何处理的情况下运行它时(只是将数据流式传输到输出),我会立即看到结果。
该示例在 Ubuntu 18.04、Apache Spark 2.4.4 上运行。
这是正常的,还是我在这里做错了什么?
谢谢!加尔