追加列熊猫:类型错误:assign() 需要 1 个位置参数,但给出了 2 个

Sye*_*ena 3 python pandas

我想将新列附加到“trainData”,当我尝试使用.assign附加新列“Age”时,两个数据框都有 712 行方法抛出我下面的错误

使用数据帧附加列的正确方法是什么?

df = pd.read_csv("data/train.csv")
#Dropping the columns  
df = df.drop(['Ticket','Cabin'], axis=1)
#Dropping the na columns
df = df.dropna() 
print("Age ====", df["Age"])
titanic_dummies = pd.get_dummies(df, columns=['Pclass', 'Sex', 'Embarked'])

trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3","Sex_female","Sex_male","Embarked_C","Embarked_Q","Embarked_S"]]
print("My train data",trainData)
trainData = trainData.assign(df["Age"])
Run Code Online (Sandbox Code Playgroud)

下面是例外

  File "<ipython-input-79-3f3ce0263545>", line 1, in <module>
    runfile('C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network/decisiontree.py', wdir='C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network')

  File "C:\RafiWork\Softwares\MiniConda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 688, in runfile
    execfile(filename, namespace)

  File "C:\RafiWork\Softwares\MiniConda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network/decisiontree.py", line 30, in <module>
    trainData = trainData.assign(df["Age"])

TypeError: assign() takes 1 positional argument but 2 were given
Run Code Online (Sandbox Code Playgroud)

jez*_*ael 7

我认为您需要定义列名:

trainData = trainData.assign(Age=df["Age"])
Run Code Online (Sandbox Code Playgroud)

感谢piRSquared的评论,如果索引的用途不同:

trainData = trainData.assign(Age=df["Age"].values)
Run Code Online (Sandbox Code Playgroud)

但随后数据未按索引对齐。

样本:

import seaborn as sns
#sample df (similar like your data)
df = sns.load_dataset("titanic")
#capitalize columns names
df.columns = df.columns.str.capitalize()
print (df.head())
   Survived  Pclass     Sex   Age  Sibsp  Parch     Fare Embarked  Class  \
0         0       3    male  22.0      1      0   7.2500        S  Third   
1         1       1  female  38.0      1      0  71.2833        C  First   
2         1       3  female  26.0      0      0   7.9250        S  Third   
3         1       1  female  35.0      1      0  53.1000        S  First   
4         0       3    male  35.0      0      0   8.0500        S  Third   

     Who  Adult_male Deck  Embark_town Alive  Alone  
0    man        True  NaN  Southampton    no  False  
1  woman       False    C    Cherbourg   yes  False  
2  woman       False  NaN  Southampton   yes   True  
3  woman       False    C  Southampton   yes  False  
4    man        True  NaN  Southampton    no   True 
Run Code Online (Sandbox Code Playgroud)
df = df.dropna() 
#print("Age ====", df["Age"])
titanic_dummies = pd.get_dummies(df, columns=['Pclass', 'Sex', 'Embarked'])

trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3","Sex_female","Sex_male","Embarked_C","Embarked_Q","Embarked_S"]]
#print("My train data",trainData.head())

trainData = trainData.assign(Age=df["Age"])
print("My train data",trainData.head())

My train data     Pclass_1  Pclass_2  Pclass_3  Sex_female  Sex_male  Embarked_C  \
1          1         0         0           1         0           1   
3          1         0         0           1         0           0   
6          1         0         0           0         1           0   
10         0         0         1           1         0           0   
11         1         0         0           1         0           0   

    Embarked_Q  Embarked_S   Age  
1            0           0  38.0  
3            0           1  35.0  
6            0           1  54.0  
10           0           1   4.0  
11           0           1  58.0  
Run Code Online (Sandbox Code Playgroud)

另一个解决方案join

trainData = trainData.join(df["Age"])
print("My train data",trainData.head())

My train data     Pclass_1  Pclass_2  Pclass_3  Sex_female  Sex_male  Embarked_C  \
1          1         0         0           1         0           1   
3          1         0         0           1         0           0   
6          1         0         0           0         1           0   
10         0         0         1           1         0           0   
11         1         0         0           1         0           0   

    Embarked_Q  Embarked_S   Age  
1            0           0  38.0  
3            0           1  35.0  
6            0           1  54.0  
10           0           1   4.0  
11           0           1  58.0  
Run Code Online (Sandbox Code Playgroud)

经过一些检查数据似乎可以将列添加Age到子集:

trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3",
                              "Sex_female","Sex_male",
                              "Embarked_C","Embarked_Q","Embarked_S",
                              "Age"]]

print("My train data",trainData.head())

My train data     Pclass_1  Pclass_2  Pclass_3  Sex_female  Sex_male  Embarked_C \
1          1         0         0           1         0           1   
3          1         0         0           1         0           0   
6          1         0         0           0         1           0   
10         0         0         1           1         0           0   
11         1         0         0           1         0           0   

    Embarked_Q  Embarked_S   Age  
1            0           0  38.0  
3            0           1  35.0  
6            0           1  54.0  
10           0           1   4.0  
11           0           1  58.0  
Run Code Online (Sandbox Code Playgroud)