Pandas scatter_matrix - 绘制分类变量

Geo*_*oel 7 python matplotlib pandas kaggle

我正在查看来自Kaggle比赛的着名泰坦尼克号数据集:http://www.kaggle.com/c/titanic-gettingStarted/data

我使用以下方法加载和处理数据:

# import required libraries
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# load the data from the file
df = pd.read_csv('./data/train.csv')

# import the scatter_matrix functionality
from pandas.tools.plotting import scatter_matrix

# define colors list, to be used to plot survived either red (=0) or green (=1)
colors=['red','green']

# make a scatter plot
scatter_matrix(df,figsize=[20,20],marker='x',c=df.Survived.apply(lambda x:colors[x]))

df.info()
Run Code Online (Sandbox Code Playgroud)

来自matplotlib的scatter_matrix

如何在情节中添加像Sex and Embarked这样的分类栏?

kni*_*fni 7

您需要将分类变量转换为数字以绘制它们.

示例(假设"Sex"栏中包含性别数据,男性为"M",女性为"F")

df['Sex_int'] = np.nan
df.loc[df['Sex'] == 'M', 'Sex_int'] = 0
df.loc[df['Sex'] == 'F', 'Sex_int'] = 1
Run Code Online (Sandbox Code Playgroud)

现在所有女性都由0和男性代表1.未知性别(如果有的话)将被忽略.

其余代码应该很好地处理更新的数据帧.