目前我正在开发 PySpark 和 DataFrame。
我创建了一个数据框:
from pyspark.sql import *
import pandas as pd
spark = SparkSession.builder.appName("DataFarme").getOrCreate()
df = spark.createDataFrame([("Java", "20000"), ("Python", "100000"), ("Scala", "3000")])
df.printSchema()
Run Code Online (Sandbox Code Playgroud)
#输出:-
root
|-- _1: string (nullable = true)
|-- _2: string (nullable = true)
Run Code Online (Sandbox Code Playgroud)
但是当我这样做时,df.show()它显示错误为:
Py4JJavaError Traceback (most recent call last)
C:\Users\PRATIK~1\AppData\Local\Temp/ipykernel_20924/3726558592.py in <module>
----> 1 df.show()
C:\Spark\spark-3.2.1-bin-hadoop3.2\python\pyspark\sql\dataframe.py in show(self, n, truncate, vertical)
492
493 if isinstance(truncate, bool) and truncate:
--> 494 print(self._jdf.showString(n, 20, vertical))
495 else:
496 try:
C:\Spark\spark-3.2.1-bin-hadoop3.2\python\lib\py4j-0.10.9.3-src.zip\py4j\java_gateway.py in __call__(self, *args) …Run Code Online (Sandbox Code Playgroud)