我从这样的csv文件中获取了一些行
pd.DataFrame(CV_data.take(5), columns=CV_data.columns)
Run Code Online (Sandbox Code Playgroud)
并在其上执行了一些功能.现在我想再次将它保存在csv中,但它正在给出错误module 'pandas' has no attribute 'to_csv'
我试图像这样保存它
pd.to_csv(CV_data, sep='\t', encoding='utf-8')
Run Code Online (Sandbox Code Playgroud)
这是我的完整代码.如何在csv或excel中保存结果数据?
# Disable warnings, set Matplotlib inline plotting and load Pandas package
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
import pandas as pd
pd.options.display.mpl_style = 'default'
CV_data = sqlContext.read.load('Downloads/data/churn-bigml-80.csv',
format='com.databricks.spark.csv',
header='true',
inferSchema='true')
final_test_data = sqlContext.read.load('Downloads/data/churn-bigml-20.csv',
format='com.databricks.spark.csv',
header='true',
inferSchema='true')
CV_data.cache()
CV_data.printSchema()
pd.DataFrame(CV_data.take(5), columns=CV_data.columns)
from pyspark.sql.types import DoubleType
from pyspark.sql.functions import UserDefinedFunction
binary_map = {'Yes':1.0, 'No':0.0, True:1.0, False:0.0}
toNum = UserDefinedFunction(lambda k: binary_map[k], DoubleType())
CV_data = CV_data.drop('State').drop('Area code') \
.drop('Total …Run Code Online (Sandbox Code Playgroud)