Sub*_*ian 6 apache-spark pyspark
如果我具有如下创建的数据框:
df = spark.table("tblName")
Run Code Online (Sandbox Code Playgroud)
无论如何,我可以从df找回tblName吗?
Ali*_*lli -2
您可以从 df 创建表。但是,如果表是本地临时视图或全局临时视图,则应在创建同名表或使用创建或替换函数(spark.createOrReplaceGlobalTempView 或spark.createOrReplaceTempView)之前删除它(sqlContext.dropTempTable)。如果表是临时表,您可以创建同名表而不会出现错误
#Create data frame
>>> d = [('Alice', 1)]
>>> test_df = spark.createDataFrame(sc.parallelize(d), ['name','age'])
>>> test_df.show()
+-----+---+
| name|age|
+-----+---+
|Alice| 1|
+-----+---+
#create tables
>>> test_df.createTempView("tbl1")
>>> test_df.registerTempTable("tbl2")
>>> sqlContext.tables().show()
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
| | tbl1| true|
| | tbl2| true|
+--------+---------+-----------+
#create data frame from tbl1
>>> df = spark.table("tbl1")
>>> df.show()
+-----+---+
| name|age|
+-----+---+
|Alice| 1|
+-----+---+
#create tbl1 again with using df data frame. It will get error
>>> df.createTempView("tbl1")
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: "Temporary view 'tbl1' already exists;"
#drop and create again
>>> sqlContext.dropTempTable('tbl1')
>>> df.createTempView("tbl1")
>>> spark.sql('select * from tbl1').show()
+-----+---+
| name|age|
+-----+---+
|Alice| 1|
+-----+---+
#create data frame from tbl2 and replace name value
>>> df = spark.table("tbl2")
>>> df = df.replace('Alice', 'Bob')
>>> df.show()
+----+---+
|name|age|
+----+---+
| Bob| 1|
+----+---+
#create tbl2 again with using df data frame
>>> df.registerTempTable("tbl2")
>>> spark.sql('select * from tbl2').show()
+----+---+
|name|age|
+----+---+
| Bob| 1|
+----+---+
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2817 次 |
| 最近记录: |