我想在Python中将数据帧保存为镶木地板文件,但我只能保存模式,而不是数据本身.
我把我的问题简化为一个非常简单的Python测试用例,我在下面从IPYNB复制了它.
关于可能发生的事情的任何建议?
In [2]:
import math
import string
import datetime
import numpy as np
import matplotlib.pyplot
from pyspark.sql import *
import pylab
import random
import time
In [3]:
sqlContext = SQLContext(sc)
?#create a simple 1 column dataframe a single row of data
df = sqlContext.createDataFrame(sc.parallelize(xrange(1)).flatMap(lambda x[Row(col1="Test row")]))
df.show()
df.count()
Out[3]:
col1
Test row
1L
In [4]:
# Persist the dataframe as a parquet file
df.saveAsParquetFile("test.parquet")
In [5]:
ls
TrapezoidRule.ipynb metastore_db/
WeatherPrecipitation.ipynb derby.log test.parquet/
In [6]:
ls -l test.parquet
total …Run Code Online (Sandbox Code Playgroud)