连接后如何重新索引pandas DataFrame

Question

连接后如何重新索引pandas DataFrame

假设我像这样连接两个 DataFrame：

import numpy as np
import pandas as pd

array1 = np.random.randn(3,3)
array2 = np.random.randn(3,3)

df1 = pd.DataFrame(array1, columns=list('ABC'))
df2 = pd.DataFrame(array2, columns=list('ABC'))

df = pd.concat([df1, df2])

Run Code Online (Sandbox Code Playgroud)

生成的 DataFramedf如下所示：

          A         B         C
0  1.297362  0.745510 -0.206756
1 -0.056807 -1.875149 -0.210556
2  0.310837 -1.068873  2.054006
0  1.163739 -0.678165  2.626052
1 -0.557625 -1.448195 -1.391434
2  0.222607 -0.334348  0.672643

Run Code Online (Sandbox Code Playgroud)

请注意，索引与原始 DataFrame 中的索引相同。我想重新索引df，以便索引简单地从0到运行5。我怎样才能做到这一点？

（我试过，df = df.reindex(index = range(df.shape[0]))但这给出了ValueError: cannot reindex from a duplicate axis。这是因为原始轴包含重复项（两个0s、两个1s 等））。

Answer 1

EdC*_*ica 9

你想传递ignore_index=True给concat：

In [68]:
array1 = np.random.randn(3,3)
array2 = np.random.randn(3,3)
?
df1 = pd.DataFrame(array1, columns=list('ABC'))
df2 = pd.DataFrame(array2, columns=list('ABC'))
?
df = pd.concat([df1, df2], ignore_index=True)
df

Out[68]:
          A         B         C
0 -0.091094  0.460133 -0.548937
1 -0.839469 -1.354138 -0.823666
2  0.088581 -1.142542 -1.746608
3  0.067320  1.014533 -1.294371
4  2.094135  0.622129  1.203257
5  0.415768 -0.467081 -0.740371

Run Code Online (Sandbox Code Playgroud)

这将忽略现有索引，因此实际上它为新连接的索引设置了一个从 0 开始的新索引

归档时间：	9 年，7 月前
查看次数：	3399 次
最近记录：	9 年，7 月前