计算pandas数据框中所选列的选定行的平均值

imp*_*ble 10 python pandas

我有pandas df,比方说,100行,10列,(实际数据很大).我还有row_index列表,其中包含哪些行被认为取平均值.我想在列2,5,6,7和8上计算平均值.我们可以使用dataframe对象的某些函数吗？

我所知道的是做一个for循环,获取row_index中每个元素的行值并继续做意思.我们是否有一些直接函数可以传递row_list,column_list和axis df.meanAdvance(row_list,column_list,axis=0)？

我见过DataFrame.mean(),但我猜不出来.

  a b c d q 
0 1 2 3 0 5
1 1 2 3 4 5
2 1 1 1 6 1
3 1 0 0 0 0

Run Code Online (Sandbox Code Playgroud)

我想要0, 2, 3每a, b, d列的行数

  a b d
0 1 1 2

Run Code Online (Sandbox Code Playgroud)

要选择数据框的行，可以使用iloc，然后可以使用方括号选择想要的列。

例如：

 df = pd.DataFrame(data=[[1,2,3]]*5, index=range(3, 8), columns = ['a','b','c'])

Run Code Online (Sandbox Code Playgroud)

给出以下数据框：

Run Code Online (Sandbox Code Playgroud)

仅选择3d和第五行，您可以执行以下操作：

df.iloc[[2,4]]

Run Code Online (Sandbox Code Playgroud)

返回：

   a  b  c
5  1  2  3
7  1  2  3

Run Code Online (Sandbox Code Playgroud)

如果然后仅选择列b和c，则使用以下命令：

df[['b', 'c']].iloc[[2,4]]

Run Code Online (Sandbox Code Playgroud)

产生：

   b  c
5  2  3
7  2  3

Run Code Online (Sandbox Code Playgroud)

然后，可以使用df.mean函数来获取数据框的此子集的平均值。如果要使用列的平均值，则可以指定axis = 0；如果要使用行的平均值，则可以指定axis = 1

从而：

df[['b', 'c']].iloc[[2,4]].mean(axis=0)

Run Code Online (Sandbox Code Playgroud)

返回：

b    2
c    3

Run Code Online (Sandbox Code Playgroud)

正如我们应该从输入数据框中预期的那样。

对于您的代码，您可以执行以下操作：

 df[column_list].iloc[row_index_list].mean(axis=0)

Run Code Online (Sandbox Code Playgroud)

评论后编辑：评论中的新问题：我必须将这些方法存储在另一个df / matrix中。我有L1，L2，L3，L4 ... LX列表，这些列表告诉我索引我对C [1、2、3]列的平均值。例如：L1 = [0，2，3]，意味着我需要行0,2,3的均值并将其存储在新df /矩阵的第一行中。然后L2 = [1,4]，我将再次为其计算均值并将其存储在新df /矩阵的第二行中。同样直到LX，我希望新的df具有X行和len（C）列。L1..LX的列将保持不变。你能帮我吗？

回答：

如果我正确理解，则以下代码应该可以解决问题（与上面的df相同，因为我将列取为'a'和'b'：

首先，您遍历所有行列表，收集所有均值作为pd.series，然后将所得的系列序列列表连接在axis = 1上，然后进行转置以正确的格式获取它。

dfs = list()
for l in L:
    dfs.append(df[['a', 'b']].iloc[l].mean(axis=0))

mean_matrix = pd.concat(dfs, axis=1).T

Run Code Online (Sandbox Code Playgroud)

您可以通过将索引列表传递给.iloc，从 DataFrame 中选择特定列，例如：

df.iloc[:, [2,5,6,7,8]]

Run Code Online (Sandbox Code Playgroud)

将返回一个包含这些编号列的 DataFrame（注意：这使用基于 0 的索引，因此2指的是第 3 列。）

要降低该列的平均值，您可以使用：

# Mean along 0 (vertical) axis: return mean for specified columns, calculated across all rows
df.iloc[:, [2,5,6,7,8]].mean(axis=0)

Run Code Online (Sandbox Code Playgroud)

要在该列中取平均值，您可以使用：

# Mean along 1 (horizontal) axis: return mean for each row, calculated across specified columns
df.iloc[:, [2,5,6,7,8]].mean(axis=1)

Run Code Online (Sandbox Code Playgroud)

您还可以为两个轴提供特定索引以返回表的子集：

df.iloc[[1,2,3,4], [2,5,6,7,8]]

Run Code Online (Sandbox Code Playgroud)

对于您的具体示例，您将执行以下操作：

import pandas as pd
import numpy as np

df = pd.DataFrame( 
np.array([[1,2,3,0,5],[1,2,3,4,5],[1,1,1,6,1],[1,0,0,0,0]]),
columns=["a","b","c","d","q"],
index = [0,1,2,3]
)

#I want mean of 0, 2, 3 rows for each a, b, d columns
#. a b d
#0 1 1 2

df.iloc[ [0,2,3], [0,1,3] ].mean(axis=0)

Run Code Online (Sandbox Code Playgroud)

哪些输出：

a    1.0
b    1.0
d    2.0
dtype: float64

Run Code Online (Sandbox Code Playgroud)

或者，要通过列名访问，首先选择那些：

df[ ['a','b','d'] ].iloc[ [0,1,3] ].mean(axis=0)

Run Code Online (Sandbox Code Playgroud)

要回答问题的第二部分（来自评论），您可以使用pd.concat. 将帧累积在列表中然后一次性传递给它会更快pd.concat，例如

dfs = []
for ix in idxs:
    dfm = df.iloc[ [0,2,3], ix ].mean(axis=0)
    dfs.append(dfm)

dfm_summary = pd.concat(dfs, axis=1) # Stack horizontally

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，11 月前
查看次数：	41038 次
最近记录：	8 年，1 月前

如何在Python中检查是否存在具有给定pid的进程？ 96

为什么运行Flask开发服务器会自行运行两次？ 81

我应该使用哪一个:os.sep或os.path.sep？ 54

什么是PyMySQL,它与MySQLdb有什么不同？它会影响Django部署吗？ 41

如何更改pandas数据帧中的单个索引值？ 28

使用Seaborn绘制多列Pandas DataFrame 13

如何将 Pandas DataFrame 保存到 excel 文件？ 8

如何使用Pandas Write_Frame将结果导出到cx_Oracle中的Oracle数据库 7

对包含列表的Pandas列的组操作 6

将其他参数传递给python pandas DataFrame适用 6

可以在JSON中使用注释吗？ 7104

在Python中调用外部命令 4553

PHP:从数组中删除元素 2362

如何按字典值对字典列表进行排序？ 1722

为什么Java有瞬态字段？ 1406

如何克隆仅Git存储库的子目录？ 1298

const和readonly有什么区别？ 1269

为PHP密码保护哈希和盐 1142

如何随机化(shuffle)一个JavaScript数组？ 1138

如何使用git merge --squash？ 1101