我想根据具有超过 20,000 行和 200 列的表集中的最后一列和最后一行的总和找到最高和最低的 5 个值。(这是一个多标签问题)。原始表没有列和行的总和。我自己添加了总和值)。在此处查看玩具数据集:
import pandas as pd
data = {'index': ['0001 ','0002 ','0003 ','0004 ','0005 ','0006
','0007','0008','0009','0010','0011'],
'factor1': [0,1,0,1,0,0,1,0,0,0,1],
'factor2': [1,0,0,1,0,0,0,1,1,1,1],
'factor3': [1,1,1,1,0,0,0,1,1,0,1],
'factor4': [0,1,1,1,0,0,1,1,0,0,1],
'factor5': [1,1,1,1,0,0,0,1,1,1,1],
'factor6': [1,0,0,0,0,0,0,1,1,1,1],
'factor7': [0,1,1,1,1,0,1,1,0,0,1],
'factor8': [1,1,1,1,1,1,0,1,1,1,1],
'factor9': [1,0,0,0,0,0,0,0,0,0,0],
}
df = pd.DataFrame(data,columns=['index','factor1','factor2','factor3','factor4','factor5','factor6','factor7','factor8','factor9'])
count_row = df.count(axis=1)
df
Run Code Online (Sandbox Code Playgroud)
这是生成的表:
index factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 factor9
0 0001 0 1 1 0 1 1 0 1 1
1 0002 1 0 1 1 1 0 1 1 0 …Run Code Online (Sandbox Code Playgroud) 我一直坚持以下格式:
0 2001-12-25
1 2002-9-27
2 2001-2-24
3 2001-5-3
4 200510
5 20078
Run Code Online (Sandbox Code Playgroud)
我需要的是格式为 %Y-%m 的日期
我尝试的是
def parse(date):
if len(date)<=5:
return "{}-{}".format(date[:4], date[4:5], date[5:])
else:
pass
df['Date']= parse(df['Date'])
Run Code Online (Sandbox Code Playgroud)
但是,我只成功解析了20078到2007-8,像2001-12-25这样的格式显示为None。那么,我该怎么做呢?谢谢你!