我想将新列附加到“trainData”,当我尝试使用.assign附加新列“Age”时,两个数据框都有 712 行方法抛出我下面的错误
使用数据帧附加列的正确方法是什么?
df = pd.read_csv("data/train.csv")
#Dropping the columns
df = df.drop(['Ticket','Cabin'], axis=1)
#Dropping the na columns
df = df.dropna()
print("Age ====", df["Age"])
titanic_dummies = pd.get_dummies(df, columns=['Pclass', 'Sex', 'Embarked'])
trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3","Sex_female","Sex_male","Embarked_C","Embarked_Q","Embarked_S"]]
print("My train data",trainData)
trainData = trainData.assign(df["Age"])
Run Code Online (Sandbox Code Playgroud)
下面是例外
File "<ipython-input-79-3f3ce0263545>", line 1, in <module>
runfile('C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network/decisiontree.py', wdir='C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network')
File "C:\RafiWork\Softwares\MiniConda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 688, in runfile
execfile(filename, namespace)
File "C:\RafiWork\Softwares\MiniConda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network/decisiontree.py", line 30, in <module>
trainData = trainData.assign(df["Age"])
TypeError: assign() takes 1 positional argument but 2 were given
Run Code Online (Sandbox Code Playgroud)
我认为您需要定义列名:
trainData = trainData.assign(Age=df["Age"])
Run Code Online (Sandbox Code Playgroud)
感谢piRSquared的评论,如果索引的用途不同:
trainData = trainData.assign(Age=df["Age"].values)
Run Code Online (Sandbox Code Playgroud)
但随后数据未按索引对齐。
样本:
import seaborn as sns
#sample df (similar like your data)
df = sns.load_dataset("titanic")
#capitalize columns names
df.columns = df.columns.str.capitalize()
print (df.head())
Survived Pclass Sex Age Sibsp Parch Fare Embarked Class \
0 0 3 male 22.0 1 0 7.2500 S Third
1 1 1 female 38.0 1 0 71.2833 C First
2 1 3 female 26.0 0 0 7.9250 S Third
3 1 1 female 35.0 1 0 53.1000 S First
4 0 3 male 35.0 0 0 8.0500 S Third
Who Adult_male Deck Embark_town Alive Alone
0 man True NaN Southampton no False
1 woman False C Cherbourg yes False
2 woman False NaN Southampton yes True
3 woman False C Southampton yes False
4 man True NaN Southampton no True
Run Code Online (Sandbox Code Playgroud)
df = df.dropna()
#print("Age ====", df["Age"])
titanic_dummies = pd.get_dummies(df, columns=['Pclass', 'Sex', 'Embarked'])
trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3","Sex_female","Sex_male","Embarked_C","Embarked_Q","Embarked_S"]]
#print("My train data",trainData.head())
trainData = trainData.assign(Age=df["Age"])
print("My train data",trainData.head())
My train data Pclass_1 Pclass_2 Pclass_3 Sex_female Sex_male Embarked_C \
1 1 0 0 1 0 1
3 1 0 0 1 0 0
6 1 0 0 0 1 0
10 0 0 1 1 0 0
11 1 0 0 1 0 0
Embarked_Q Embarked_S Age
1 0 0 38.0
3 0 1 35.0
6 0 1 54.0
10 0 1 4.0
11 0 1 58.0
Run Code Online (Sandbox Code Playgroud)
另一个解决方案join:
trainData = trainData.join(df["Age"])
print("My train data",trainData.head())
My train data Pclass_1 Pclass_2 Pclass_3 Sex_female Sex_male Embarked_C \
1 1 0 0 1 0 1
3 1 0 0 1 0 0
6 1 0 0 0 1 0
10 0 0 1 1 0 0
11 1 0 0 1 0 0
Embarked_Q Embarked_S Age
1 0 0 38.0
3 0 1 35.0
6 0 1 54.0
10 0 1 4.0
11 0 1 58.0
Run Code Online (Sandbox Code Playgroud)
经过一些检查数据似乎可以将列添加Age到子集:
trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3",
"Sex_female","Sex_male",
"Embarked_C","Embarked_Q","Embarked_S",
"Age"]]
print("My train data",trainData.head())
My train data Pclass_1 Pclass_2 Pclass_3 Sex_female Sex_male Embarked_C \
1 1 0 0 1 0 1
3 1 0 0 1 0 0
6 1 0 0 0 1 0
10 0 0 1 1 0 0
11 1 0 0 1 0 0
Embarked_Q Embarked_S Age
1 0 0 38.0
3 0 1 35.0
6 0 1 54.0
10 0 1 4.0
11 0 1 58.0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6769 次 |
| 最近记录: |