我试图使用三个表的连接在SPARK SQL中编写查询.但查询输出为null.它适用于单桌.我的Join查询是正确的,因为我已经在oracle数据库中执行了它.我需要在这里进行哪些更正?Spark版本是2.0.0
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
lines = sc.textFile("/Users/Hadoop_IPFile/purchase")
lines2 = sc.textFile("/Users/Hadoop_IPFile/customer")
lines3 = sc.textFile("/Users/Hadoop_IPFile/book")
parts = lines.map(lambda l: l.split("\t"))
purchase = parts.map(lambda p: Row(year=p[0],cid=p[1],isbn=p[2],seller=p[3],price=int(p[4])))
schemapurchase = sqlContext.createDataFrame(purchase)
schemapurchase.registerTempTable("purchase")
parts2 = lines.map(lambda l: l.split("\t"))
customer = parts2.map(lambda p: Row(cid=p[0],name=p[1],age=p[2],city=p[3],sex=p[4]))
schemacustomer = sqlContext.createDataFrame(customer)
schemacustomer.registerTempTable("customer")
parts3 = lines.map(lambda l: l.split("\t"))
book = parts3.map(lambda p: Row(isbn=p[0],name=p[1]))
schemabook = sqlContext.createDataFrame(book)
schemabook.registerTempTable("book")
result_purchase = sqlContext.sql("""SELECT DISTINCT customer.name AS name FROM purchase JOIN book ON purchase.isbn = book.isbn JOIN customer ON customer.cid …Run Code Online (Sandbox Code Playgroud) RDD SAMPLE如何在spark中工作?它的不同参数的功能是什么,即样本(withReplacement,fraction,seed).
我在网上找不到与'withReplacement'和'seed'参数相关的任何内容.请举例说明.