我如何使用Spark ORC索引?

For*_*sed 6 apache-spark orc

从spark启用orc索引的选项是什么?

          df
            .write()
            .option("mode", "DROPMALFORMED")
            .option("compression", "snappy")
            .mode("overwrite")
            .format("orc")
            .option("index", "user_id")
            .save(...);
Run Code Online (Sandbox Code Playgroud)

我正在编写.option("index", uid),我还要把它放在那里从orc索引列"user_id".

Mal*_*ssi 2

你有没有尝试过 :.partitionBy("user_id")

 df
        .write()
        .option("mode", "DROPMALFORMED")
        .option("compression", "snappy")
        .mode("overwrite")
        .format("orc")
        .partitionBy("user_id")
        .save(...)
Run Code Online (Sandbox Code Playgroud)