python pandas upper()不适用于字符串列

YJZ*_*YJZ 2 python apply uppercase pandas kaggle

嗨,我正在使用Kaggle Titanic数据.我apply(lambda x: x.upper())用来处理多个列,但它不起作用.

我把数据放在谷歌驱动器上,你可以在这里下载.

我测试每一列,这是所有object类型(我认为这意味着str,如果它是错的请纠正我).但有些专栏报道'float' object has no attribute 'upper'

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

train = pd.read_csv('train.csv', header=0)

train.ix[:,['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked']].dtypes
# Name        object
# Sex         object
# Ticket      object
# Cabin       object
# Embarked    object
# dtype: object

train.ix[:,['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked']].apply(lambda x: x.upper()) 
# not work

# try each column
train.ix[:,'Name'].apply(lambda x: x.upper()) # works
train.ix[:,'Sex'].apply(lambda x: x.upper()) # works
train.ix[:,'Ticket'].apply(lambda x: x.upper()) # works
train.ix[:,'Cabin'].apply(lambda x: x.upper()) # AttributeError: 'float' object has no attribute 'upper'
train.ix[:,'Embarked'].apply(lambda x: x.upper()) # AttributeError: 'float' object has no attribute 'upper'
Run Code Online (Sandbox Code Playgroud)

任何帮助表示赞赏.谢谢!

Ant*_*pov 5

这是因为你的专栏Cabin,并Embarked含有NaN具有D型值np.float.您可以使用适用的铸造类型进行检查:

In [355]: train.Cabin.apply(lambda x: type(x))[:10]
Out[355]:
0    <class 'float'>
1      <class 'str'>
2    <class 'float'>
3      <class 'str'>
4    <class 'float'>
5    <class 'float'>
6      <class 'str'>
7    <class 'float'>
8    <class 'float'>
9    <class 'float'>
Name: Cabin, dtype: object
Run Code Online (Sandbox Code Playgroud)

所以你可以默认使用str.upper哪个句柄NaN.或者你可以填写你的NaN价值观,以空字符串''fillna它有upper方法,然后用你的`lambda函数:

In [363]: train.Cabin.fillna('').apply(lambda x: x.upper)[:5]
Out[363]:
0
1     C85
2
3    C123
4
Name: Cabin, dtype: object

In [365]: train.Cabin.str.upper()[:5]
Out[365]:
0     NaN
1     C85
2     NaN
3    C123
4     NaN
Name: Cabin, dtype: object
Run Code Online (Sandbox Code Playgroud)

或者如果你想保存NaN为sting,你可以用NaN字符串填充:

In [369]: train.Cabin.fillna('NaN').apply(lambda x: x.upper())[:5]
Out[369]:
0     NAN
1     C85
2     NAN
3    C123
4     NAN
Name: Cabin, dtype: object
Run Code Online (Sandbox Code Playgroud)