4da*_*ong 6 python numpy python-3.x pandas
我有一个这样的投票数据集:
republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
Run Code Online (Sandbox Code Playgroud)
但它们都是字符串,所以我想将它们更改为整数矩阵并进行统计 hou_dat = pd.read_csv("house.data", header=None)
for i in range (0, hou_dat.shape[0]):
for j in range (0, hou_dat.shape[1]):
if hou_dat[i, j] == "republican":
hou_dat[i, j] = 2
if hou_dat[i, j] == "democrat":
hou_dat[i, j] = 3
if hou_dat[i, j] == "y":
hou_dat[i, j] = 1
if hou_dat[i, j] == "n":
hou_dat[i, j] = 0
if hou_dat[i, j] == "?":
hou_dat[i, j] = -1
hou_sta = hou_dat.apply(pd.value_counts)
print(hou_sta)
Run Code Online (Sandbox Code Playgroud)
但是,它显示错误,如何解决?:
Exception has occurred: KeyError
(0, 0)
Run Code Online (Sandbox Code Playgroud)
IIUC,你需要map并且stack
map_dict = {'republican' : 2,
'democrat' : 3,
'y' : 1,
'n' : 0,
'?' : -1}
df1 = df.stack().map(map_dict).unstack()
print(df1)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 2 0 1 0 1 1 1 0 0 0 1 -1 1 1 1 0 1
1 2 0 1 0 1 1 1 0 0 0 0 0 1 1 1 0 -1
2 3 -1 1 1 -1 1 1 0 0 0 0 1 0 1 1 0 0
3 3 0 1 1 0 -1 1 0 0 0 0 1 0 1 0 0 1
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
90 次 |
| 最近记录: |