使用行索引的Pandas Split DataFrame

Question

使用行索引的Pandas Split DataFrame

Pra*_*ala 3 python dataframe pandas pandas-groupby

我想使用行索引按不均匀的行数拆分数据帧。

下面的代码：

groups = df.groupby((np.arange(len(df.index))/l[1]).astype(int))

Run Code Online (Sandbox Code Playgroud)

仅适用于统一的行数。

df

a b c  
1 1 1  
2 2 2  
3 3 3  
4 4 4  
5 5 5  
6 6 6  
7 7 7  

l = [2, 5, 7]

df1  
1 1 1  
2 2 2  

df2  
3,3,3  
4,4,4  
5,5,5  

df3  
6,6,6  
7,7,7  

df4  
8,8,8

Run Code Online (Sandbox Code Playgroud)

Answer 1

Sco*_*ton 9

您可以先使用列表理解功能，然后再使用一些提示，第一个。

print(df)

   a  b  c
0  1  1  1
1  2  2  2
2  3  3  3
3  4  4  4
4  5  5  5
5  6  6  6
6  7  7  7
7  8  8  8


l = [2,5,7]
l_mod = [0] + l + [max(l)+1]

list_of_dfs = [df.iloc[l_mod[n]:l_mod[n+1]] for n in range(len(l_mod)-1)]

Run Code Online (Sandbox Code Playgroud)

输出：

list_of_dfs[0]

   a  b  c
0  1  1  1
1  2  2  2

list_of_dfs[1]

   a  b  c
2  3  3  3
3  4  4  4
4  5  5  5

list_of_dfs[2]

   a  b  c
5  6  6  6
6  7  7  7

list_of_dfs[3]

   a  b  c
7  8  8  8

Run Code Online (Sandbox Code Playgroud)

如果我错了，请纠正我，但我认为修改后的列表应该是：`l_mod = [0] + l + [len(df)]`。现在，在这种情况下，“max(l)+1”和“len(df)”重合，但如果概括，您可能会丢失行。第二个注意点是，可能值得将其传递给“set”以确保不存在重复的索引（例如两次使用“[0]”）。顺便说一句，很好的解决方案，你得到了我的支持:) (2认同)

Answer 2

Moh*_*ani 5

我认为这就是你所需要的：

df = pd.DataFrame({'a': np.arange(1, 8),
                  'b': np.arange(1, 8),
                  'c': np.arange(1, 8)})
df.head()
    a   b   c
0   1   1   1
1   2   2   2
2   3   3   3
3   4   4   4
4   5   5   5
5   6   6   6
6   7   7   7

last_check = 0
dfs = []
for ind in [2, 5, 7]:
    dfs.append(df.loc[last_check:ind-1])
    last_check = ind

Run Code Online (Sandbox Code Playgroud)

尽管列表理解比 for 循环高效得多，但如果索引列表中没有模式，则需要使用 last_check。

dfs[0]

    a   b   c
0   1   1   1
1   2   2   2

dfs[2]

    a   b   c
5   6   6   6
6   7   7   7

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，9 月前
查看次数：	7393 次
最近记录：	5 年，11 月前