Nik*_*Nik 5 python string series dataframe pandas
我有一个数据框,如果“性别”列为空,我想用“名称”列的值填充“column3”,否则用“性别”列的值填充“column3”
vals = {
'name' : ['n1', 'n2', 'n3', 'n4', 'n5', 'n6', 'n7'],
'gender' : ['', '', '', 'f', 'f', 'c', 'c'],
'age' : [39, 12, 27, 13, 36, 29, 10]
}
df4 = pd.DataFrame(vals)
df4['column3'] = df4['name'] if len(df4['gender']) == 0 else df4['gender']
Run Code Online (Sandbox Code Playgroud)
结果是第 3 列仅包含取自“性别”的值。我尝试过以下语句:
df4['column3'] = np.where(df4['gender'].empty, df4['name'],df4['gender'])
df4['column3'] = df4['name'] if df4['gender'].empty else df4['gender']
Run Code Online (Sandbox Code Playgroud)
结果相同..所以我认为我的代码无法识别 Python 数据框中的空字符串。我缺少什么?
您的numpy.where构造完全可以使用。
您面临的问题是如何测试列与空字符串。答案只是检查是否相等''。
这很容易实现:
df4['column3'] = np.where(df4['gender'] == '', df4['name'], df4['gender'])
Run Code Online (Sandbox Code Playgroud)
pd.Series.empty测试该系列是否没有items,即没有行,而不是测试其元素是否为空字符串。
例子
import pandas as pd, numpy as np
vals = {
'name' : ['n1', 'n2', 'n3', 'n4', 'n5', 'n6', 'n7'],
'gender' : ['', '', '', 'f', 'f', 'c', 'c'],
'age' : [39, 12, 27, 13, 36, 29, 10]
}
df4 = pd.DataFrame(vals)
df4['column3'] = np.where(df4['gender'] == '', df4['name'], df4['gender'])
# age gender name column3
# 0 39 n1 n1
# 1 12 n2 n2
# 2 27 n3 n3
# 3 13 f n4 f
# 4 36 f n5 f
# 5 29 c n6 c
# 6 10 c n7 c
Run Code Online (Sandbox Code Playgroud)