这显然很简单,但作为一只熊猫,我会陷入困境.
我有一个包含3列的CSV文件,State,bene_1_count和bene_2_count.
我想计算给定状态下'bene_1_count'和'bene_2_count'的比例.
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
'bene_1_count': [np.random.randint(10000, 99999)
for _ in range(12)],
'bene_2_count': [np.random.randint(10000, 99999)
for _ in range(12)]})
Run Code Online (Sandbox Code Playgroud)
我正在尝试以下内容,但它给了我一个错误:'没有连接的对象'
df['ratio'] = df.groupby(['state']).agg(df['bene_1_count']/df['bene_2_count'])
Run Code Online (Sandbox Code Playgroud)
我无法弄清楚如何"达到"群组的状态级别来获取列的比率.
我希望列的比例与状态相似,就像我想要的输出如下:
State ratio
CA
WA
CO
AZ
Run Code Online (Sandbox Code Playgroud) 我制作了以下程序,我正在从 sqlite 表中获取数据并想创建一个 Pandas 数据框。
import sqlite3 as lite
import pandas as pd
con=lite.connect('/Users/mac/Desktop/Python/Baye_stat/productiondisruption/PCI_meat.sqlite')
cur=con.cursor()
cur.execute("SELECT * from InmateLostHours")
losthours = cur.fetchall()
k=len(losthours)-1
jan=[]
feb=[]
march=[]
april=[]
may=[]
june=[]
july=[]
aug=[]
sept=[]
october=[]
nov=[]
dec=[]
for i in range(0,k):
may.append((losthours[i][3])/(losthours[i][15]))
june.append((losthours[i][4])/(losthours[i][15]))
july.append((losthours[i][5])/(losthours[i][15]))
aug.append((losthours[i][6])/(losthours[i][15]))
sept.append((losthours[i][7])/(losthours[i][15]))
october.append((losthours[i][8])/(losthours[i][15]))
nov.append((losthours[i][9])/(losthours[i][15]))
dec.append((losthours[i][10])/(losthours[i][15]))
jan.append((losthours[i][11])/(losthours[i][15]))
feb.append((losthours[i][12])/(losthours[i][15]))
march.append((losthours[i][13])/(losthours[i][15]))
april.append((losthours[i][14])/(losthours[i][15]))
institutionhours=pd.DataFrame({
'May' :[may],
'June':[june],
'July':[july],
'August':[aug],
'September':[sept],
'October':[october],
'November':[nov],
'December':[dec],
'January':[jan],
'Feburary':[feb],
'March':[march],
'April':[april]
})
Run Code Online (Sandbox Code Playgroud)
我想要一个大小为 (16, 12) 的干净数据框,但我在此处输入图像描述