是否有列的 reset_index 或将列标题移动到内部索引的方法，使它们的索引位置作为外部索引？

Question

是否有列的 reset_index 或将列标题移动到内部索引的方法，使它们的索引位置作为外部索引？

示例数据帧：

df = pd.DataFrame(np.random.randint(0, 10, size=(10, 4)), columns=list('ABCD'))

Run Code Online (Sandbox Code Playgroud)

有没有办法重置列的索引？或轻松插入具有列索引位置值的行？我希望索引位置是最外层的索引，而列标题则是最内层的索引。

Answer 1

ili*_*eev 7

删除列名

df.columns = pd.RangeIndex(df.columns.size)
df

Run Code Online (Sandbox Code Playgroud)

输出：

    0   1   2   3
#---------------#
0   0   1   3   3
1   2   2   0   2
2   2   1   3   1
3   2   1   0   0

Run Code Online (Sandbox Code Playgroud)

将列名删除一行
可能有性能问题和副作用，请参阅下面的讨论。

df.T.reset_index(drop=True).T

Run Code Online (Sandbox Code Playgroud)

输出：

    0   1   2   3
#---------------#
0   0   1   3   3
1   2   2   0   2
2   2   1   3   1
3   2   1   0   0

Run Code Online (Sandbox Code Playgroud)

实时列名作为第一行
同样的问题，见下面的讨论。

df.T.reset_index().T

Run Code Online (Sandbox Code Playgroud)

输出：

        0   1   2   3
#-------------------#
index   A   B   C   D
   0    0   1   3   3
   1    2   2   0   2
   2    2   1   3   1
   3    2   1   0   0

Run Code Online (Sandbox Code Playgroud)

活列名作为行的
有效方式。

 #heterogeneous DataFrame creation
df = pd.DataFrame(np.random.randint(0,4,size=(4, 3)), columns=list('789')).join(
     pd.DataFrame(list('bcde'),columns=['A']))
df.index.name = '4'

#save column as row then reindex column names
df = df.append(pd.Series( df.columns,name = df.index.name,index= df.columns ), )
df.columns = pd.RangeIndex(df.columns.size)
print (df)
print(df.info())

Run Code Online (Sandbox Code Playgroud)

输出： 注意， 您将需要额外的努力来防止所有数据的大写

   0  1  2  3
#-----------#
4            
0  2  3  2  b
1  1  0  2  c
2  3  1  3  d
3  3  3  2  e
4  7  8  9  A

<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, 0 to 4
Data columns (total 4 columns):
0    5 non-null object
1    5 non-null object
2    5 non-null object
3    5 non-null object
dtypes: object(4)

Run Code Online (Sandbox Code Playgroud)

添加辅助列索引一行
可能有性能问题和副作用，请参阅下面的讨论。

df.T.set_index(pd.RangeIndex(df.columns.size),append=True).T

Run Code Online (Sandbox Code Playgroud)

输出：

    A   B   C   D
    0   1   2   3
#---------------#
0   0   1   3   3
1   2   2   0   2
2   2   1   3   1
3   2   1   0   0

Run Code Online (Sandbox Code Playgroud)

一线方法批评

性能问题：
对于巨大的数据集可能是无法接受的 double 成本T，但在简单的情况下，返回 DataFrame 副本的一行可能有用。查看测试结果

In [294]: for i in range (3,7):
     ...:     df = pd.DataFrame(np.random.randint(0,9,size=(10**i, 10**3)))
     ...:     print ('shape:',df.shape)
     ...:     %timeit df.T.reset_index(drop=True)
     ...: 
shape: (1000, 1000)
100 loops, best of 3: 3.2 ms per loop
shape: (10000, 1000)
10 loops, best of 3: 29.3 ms per loop
shape: (100000, 1000)
1 loop, best of 3: 546 ms per loop
shape: (1000000, 1000)
1 loop, best of 3: 9.9 s per loop

In [295]: %timeit df.columns = pd.RangeIndex(df.columns.size)
The slowest run took 28.60 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.74 µs per loop

Run Code Online (Sandbox Code Playgroud)

副作用（向上转换）：
异构数据帧将被向上转换

In [352]: df = pd.DataFrame(np.random.randint(0,4,size=(4, 3)), columns=list('789')).join(
     ...:          pd.DataFrame(list('bcde'),columns=['A']))

In [353]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
7    4 non-null int64
8    4 non-null int64
9    4 non-null int64
A    4 non-null object
dtypes: int64(3), object(1)
memory usage: 208.0+ bytes

Run Code Online (Sandbox Code Playgroud)

.TT 上传

In [354]: df.T.T.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
7    4 non-null object
8    4 non-null object
9    4 non-null object
A    4 non-null object
dtypes: object(4)
memory usage: 208.0+ bytes

Run Code Online (Sandbox Code Playgroud)

DataFrames 通过将不同的 dtype 组合在一起来存储异构数据。当您进行转置时，pandas 必须向上转换行的 dtype。这会导致不必要的副作用，因为如果您的原始 df 有一列字符串和其他数字列，则其转置的转置将具有所有对象 dtypes。在再次手动将它们转换为数字之前，您将无法进行数字运算。 (2认同)

Answer 2

jez*_*ael 4

我认为你可以使用numpy.arange或range：

np.random.seed(10)
df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))

df.columns = np.arange(len(df.columns))
#alternatively
#df.columns = range(len(df.columns))
print (df)
   0  1  2  3
0  9  4  0  1
1  9  0  1  8
2  9  0  8  6
3  4  3  0  4
4  6  8  1  8
5  4  1  3  6
6  5  3  9  6
7  9  1  9  4
8  2  6  7  8
9  8  9  2  0

Run Code Online (Sandbox Code Playgroud)

但丢失了列值。

如果需要MultiIndex不带名字：

df.columns = [np.arange(len(df.columns)), df.columns]
print (df)
   0  1  2  3
   A  B  C  D
0  9  4  0  1
1  9  0  1  8
2  9  0  8  6
3  4  3  0  4
4  6  8  1  8
5  4  1  3  6
6  5  3  9  6
7  9  1  9  4
8  2  6  7  8
9  8  9  2  0

Run Code Online (Sandbox Code Playgroud)

对于名称，请使用MultiIndex.from_arrays：

names = ['a','b']
df.columns = pd.MultiIndex.from_arrays([np.arange(len(df.columns)), df.columns], names=names)
print (df)
a  0  1  2  3
b  A  B  C  D
0  9  4  0  1
1  9  0  1  8
2  9  0  8  6
3  4  3  0  4
4  6  8  1  8
5  4  1  3  6
6  5  3  9  6
7  9  1  9  4
8  2  6  7  8
9  8  9  2  0

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，6 月前
查看次数：	7407 次
最近记录：	6 年，11 月前