Pandas groupby to to_csv

kal*_*own 7 python csv pandas pandas-groupby

想要将数据帧输出Pandas group到CSV.尝试了各种StackOverflow解决方案,但它们没有奏效.

Python 3.6.1,Pandas 0.20.1

groupby结果如下:

id  month   year    count
week                
0   9066    82  32142   895
1   7679    84  30112   749
2   8368    126 42187   872
3   11038   102 34165   976
4   8815    117 34122   767
5   10979   163 50225   1252
6   8726    142 38159   996
7   5568    63  26143   582
Run Code Online (Sandbox Code Playgroud)

想要一个看起来像的csv

week  count
0   895
1   749
2   872
3   976
4   767
5   1252
6   996
7   582
Run Code Online (Sandbox Code Playgroud)

当前代码:

week_grouped = df.groupby('week')
week_grouped.sum() #At this point you have the groupby result
week_grouped.to_csv('week_grouped.csv') #Can't do this - .to_csv is not a df function. 
Run Code Online (Sandbox Code Playgroud)

阅读SO解决方案:

输出groupby到csv文件pandas

week_grouped.drop_duplicates().to_csv('week_grouped.csv')
Run Code Online (Sandbox Code Playgroud)

结果: AttributeError:无法访问'DataFrameGroupBy'对象的可调用属性'drop_duplicates',请尝试使用'apply'方法

Python pandas - 将groupby输出写入文件

week_grouped.reset_index().to_csv('week_grouped.csv')
Run Code Online (Sandbox Code Playgroud)

结果: AttributeError:"无法访问'DataFrameGroupBy'对象的可调用属性'reset_index',请尝试使用'apply'方法"

Rev*_*vaz 9

Group By 返回键、值对,其中键是组的标识符,值是组本身,即与键匹配的原始 df 的子集。

在您的示例中week_grouped = df.groupby('week')是一组组(pandas.core.groupby.DataFrameGroupBy 对象),您可以按如下方式详细探索:

for k, gr in week_grouped:
    # do your stuff instead of print
    print(k)
    print(type(gr)) # This will output <class 'pandas.core.frame.DataFrame'>
    print(gr)
    # You can save each 'gr' in a csv as follows
    gr.to_csv('{}.csv'.format(k))
Run Code Online (Sandbox Code Playgroud)

或者,您可以计算分组对象上的聚合函数

result = week_grouped.sum()
# This will be already one row per key and its aggregation result
result.to_csv('result.csv') 
Run Code Online (Sandbox Code Playgroud)

在您的示例中,您需要将函数结果分配给某个变量,因为默认情况下 pandas 对象是不可变的。

some_variable = week_grouped.sum() 
some_variable.to_csv('week_grouped.csv') # This will work
Run Code Online (Sandbox Code Playgroud)

基本上 result.csv 和 week_grouped.csv 是相同的


Ale*_*ias 7

试着这样做:

week_grouped = df.groupby('week')
week_grouped.sum().reset_index().to_csv('week_grouped.csv')
Run Code Online (Sandbox Code Playgroud)

那会将整个数据帧写入文件.如果你只想要那两列,

week_grouped = df.groupby('week')
week_grouped.sum().reset_index()[['week', 'count']].to_csv('week_grouped.csv')
Run Code Online (Sandbox Code Playgroud)

这是原始代码的逐行说明:

# This creates a "groupby" object (not a dataframe object) 
# and you store it in the week_grouped variable.
week_grouped = df.groupby('week')

# This instructs pandas to sum up all the numeric type columns in each 
# group. This returns a dataframe where each row is the sum of the 
# group's numeric columns. You're not storing this dataframe in your 
# example.
week_grouped.sum() 

# Here you're calling the to_csv method on a groupby object... but
# that object type doesn't have that method. Dataframes have that method. 
# So we should store the previous line's result (a dataframe) into a variable 
# and then call its to_csv method.
week_grouped.to_csv('week_grouped.csv')

# Like this:
summed_weeks = week_grouped.sum()
summed_weeks.to_csv('...')

# Or with less typing simply
week_grouped.sum().to_csv('...')
Run Code Online (Sandbox Code Playgroud)


Pet*_*ler 4

尝试将第二行更改为week_grouped = week_grouped.sum()并重新运行所有三行。

week_grouped.sum()如果您在其自己的 Jupyter 笔记本单元中运行,您将看到该语句如何输出返回到单元的输出,而不是将结果分配回week_grouped。一些 pandas 方法有一个inplace=True参数(例如df.sort_values(by=col_name, inplace=True)),但sum没有。

编辑:每周数字在您的 CSV 中只出现一次吗?如果是这样,这是一个不使用的更简单的解决方案groupby

df = pd.read_csv('input.csv')
df[['id', 'count']].to_csv('output.csv')
Run Code Online (Sandbox Code Playgroud)