YJZ*_*YJZ 2 python apply uppercase pandas kaggle
嗨,我正在使用Kaggle Titanic数据.我apply(lambda x: x.upper())用来处理多个列,但它不起作用.
我把数据放在谷歌驱动器上,你可以在这里下载.
我测试每一列,这是所有object类型(我认为这意味着str,如果它是错的请纠正我).但有些专栏报道'float' object has no attribute 'upper'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
train = pd.read_csv('train.csv', header=0)
train.ix[:,['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked']].dtypes
# Name object
# Sex object
# Ticket object
# Cabin object
# Embarked object
# dtype: object
train.ix[:,['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked']].apply(lambda x: x.upper())
# not work
# try each column
train.ix[:,'Name'].apply(lambda x: x.upper()) # works
train.ix[:,'Sex'].apply(lambda x: x.upper()) # works
train.ix[:,'Ticket'].apply(lambda x: x.upper()) # works
train.ix[:,'Cabin'].apply(lambda x: x.upper()) # AttributeError: 'float' object has no attribute 'upper'
train.ix[:,'Embarked'].apply(lambda x: x.upper()) # AttributeError: 'float' object has no attribute 'upper'
Run Code Online (Sandbox Code Playgroud)
任何帮助表示赞赏.谢谢!
这是因为你的专栏Cabin,并Embarked含有NaN具有D型值np.float.您可以使用适用的铸造类型进行检查:
In [355]: train.Cabin.apply(lambda x: type(x))[:10]
Out[355]:
0 <class 'float'>
1 <class 'str'>
2 <class 'float'>
3 <class 'str'>
4 <class 'float'>
5 <class 'float'>
6 <class 'str'>
7 <class 'float'>
8 <class 'float'>
9 <class 'float'>
Name: Cabin, dtype: object
Run Code Online (Sandbox Code Playgroud)
所以你可以默认使用str.upper哪个句柄NaN.或者你可以填写你的NaN价值观,以空字符串''与fillna它有upper方法,然后用你的`lambda函数:
In [363]: train.Cabin.fillna('').apply(lambda x: x.upper)[:5]
Out[363]:
0
1 C85
2
3 C123
4
Name: Cabin, dtype: object
In [365]: train.Cabin.str.upper()[:5]
Out[365]:
0 NaN
1 C85
2 NaN
3 C123
4 NaN
Name: Cabin, dtype: object
Run Code Online (Sandbox Code Playgroud)
或者如果你想保存NaN为sting,你可以用NaN字符串填充:
In [369]: train.Cabin.fillna('NaN').apply(lambda x: x.upper())[:5]
Out[369]:
0 NAN
1 C85
2 NAN
3 C123
4 NAN
Name: Cabin, dtype: object
Run Code Online (Sandbox Code Playgroud)