为了提供一些背景知识,我尝试在有和没有Spark的催化剂优化程序的情况下在Spark上运行TPCDS基准测试。对于较小数据集上的复杂查询,我们可能比实际执行计划花费更多的时间来优化计划。因此,想要衡量优化器对查询整体执行的性能影响
有没有办法禁用某些或所有火花催化剂优化规则?
optimization query-optimization apache-spark apache-spark-sql spark-dataframe
我是一个非常新的线程世界的人,一直试图解决这个问题一个星期了.
Thread类中的run方法由于某种原因没有被调用,我不知道为什么(但是很想知道)
ProcessBuilder processBuilder = new ProcessBuilder();
processBuilder.command("/bin/sh", "-c", "echo \"w30000001z,none,16488,,181075\nw30000001z,none,16488,,181082\n\" | /home/beehive/bin/exec/tableSize");
Process process = processBuilder.start();
process.waitFor();
InputStream stdin = process.getInputStream();
OutputStream stdout = process.getOutputStream();
InputStream stderr = process.getErrorStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(stdin));
BufferedReader error = new BufferedReader(new InputStreamReader(stderr));
StreamGobbler errorStream = new StreamGobbler(process.getErrorStream(), "ERROR");
StreamGobbler outputStream = new StreamGobbler(process.getInputStream(), "OUTPUT");
errorStream.start();
outputStream.start();
errorStream.join();
outputStream.join();
Run Code Online (Sandbox Code Playgroud)
tableSize是一个python可执行文件,它通过stdin获取输入,处理它并输出几行文本.我需要收集此输出并对其进行进一步处理.
有一个单独的线程来处理inputStream和errorStream上的输出.该线程类如下所示.
/* StreamGobbler.java */
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.PrintWriter;
class StreamGobbler extends Thread
{ …Run Code Online (Sandbox Code Playgroud)