我在df中有几个名称相同的列.需要重命名它们.通常的重命名重命名全部无论如何我可以将下面的blah(s)重命名为blah1,blah4,blah5?
In [6]:
df=pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns=['blah','blah2','blah3','blah','blah']
df
Out[6]:
blah blah2 blah3 blah blah
0 0 1 2 3 4
1 5 6 7 8 9
Run Code Online (Sandbox Code Playgroud)
在[7]中:
df.rename(columns = {'blah':'blah1'})
Out[7]:
blah1 blah2 blah3 blah1 blah1
0 0 1 2 3 4
1 5 6 7 8 9
Run Code Online (Sandbox Code Playgroud)
Lam*_*aha 16
我想在Pandas中找到一个解决方案而不是一般的Python解决方案.如果列的get_loc()函数找到重复项,并且"True"值指向找到重复项的位置,则它返回一个屏蔽数组.然后我使用掩码为这些位置分配新值.在我的情况下,我提前知道我将获得多少重复以及我将要分配给他们但是看起来df.columns.get_duplicates()会返回所有重复的列表然后你可以如果需要更通用的dup-weeding操作,请将该列表与get_loc()结合使用
cols=pd.Series(df.columns)
for dup in df.columns.get_duplicates():
cols[df.columns.get_loc(dup)] = ([dup + '.' + str(d_idx)
if d_idx != 0
else dup
for d_idx in range(df.columns.get_loc(dup).sum())]
)
df.columns=cols
blah blah2 blah3 blah.1 blah.2
0 0 1 2 3 4
1 5 6 7 8 9
Run Code Online (Sandbox Code Playgroud)
Max*_*axU 14
从Pandas 0.19.0开始,pd.read_csv()已经改进了对重复列名称的支持
所以我们可以尝试使用内部方法:
In [137]: pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)
Out[137]: ['blah', 'blah2', 'blah3', 'blah.1', 'blah.2']
Run Code Online (Sandbox Code Playgroud)
这是"魔术"功能:
def _maybe_dedup_names(self, names):
# see gh-7160 and gh-9424: this helps to provide
# immediate alleviation of the duplicate names
# issue and appears to be satisfactory to users,
# but ultimately, not needing to butcher the names
# would be nice!
if self.mangle_dupe_cols:
names = list(names) # so we can index
counts = {}
for i, col in enumerate(names):
cur_count = counts.get(col, 0)
if cur_count > 0:
names[i] = '%s.%d' % (col, cur_count)
counts[col] = cur_count + 1
return names
Run Code Online (Sandbox Code Playgroud)
你可以用这个:
def df_column_uniquify(df):
df_columns = df.columns
new_columns = []
for item in df_columns:
counter = 0
newitem = item
while newitem in new_columns:
counter += 1
newitem = "{}_{}".format(item, counter)
new_columns.append(newitem)
df.columns = new_columns
return df
Run Code Online (Sandbox Code Playgroud)
然后
import numpy as np
import pandas as pd
df=pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns=['blah','blah2','blah3','blah','blah']
Run Code Online (Sandbox Code Playgroud)
所以df:
blah blah2 blah3 blah blah
0 0 1 2 3 4
1 5 6 7 8 9
Run Code Online (Sandbox Code Playgroud)
然后
df = df_column_uniquify(df)
Run Code Online (Sandbox Code Playgroud)
所以df:
blah blah2 blah3 blah_1 blah_2
0 0 1 2 3 4
1 5 6 7 8 9
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
14900 次 |
| 最近记录: |