我有一个s重复索引系列:
>>> s
STK_ID RPT_Date
600809 20061231 demo_str
20070331 demo_str
20070630 demo_str
20070930 demo_str
20071231 demo_str
20060331 demo_str
20060630 demo_str
20060930 demo_str
20061231 demo_str
20070331 demo_str
20070630 demo_str
Name: STK_Name, Length: 11
Run Code Online (Sandbox Code Playgroud)
我只想通过以下方式保留唯一行和重复行的一个副本:
s[s.index.unique()]
Run Code Online (Sandbox Code Playgroud)
Pandas 0.10.1.dev-f7f7e13 给出以下错误消息
>>> s[s.index.unique()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "d:\Python27\lib\site-packages\pandas\core\series.py", line 515, in __getitem__
return self._get_with(key)
File "d:\Python27\lib\site-packages\pandas\core\series.py", line 558, in _get_with
return self.reindex(key)
File "d:\Python27\lib\site-packages\pandas\core\series.py", line 2361, in reindex
level=level, limit=limit)
File "d:\Python27\lib\site-packages\pandas\core\index.py", line 2063, in reindex
limit=limit)
File "d:\Python27\lib\site-packages\pandas\core\index.py", line 2021, in get_indexer
raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects
>>>
Run Code Online (Sandbox Code Playgroud)
那么如何删除额外的重复行系列,保持唯一行和重复行的只有一个副本的有效方式?(一行更好)
Zel*_*ny7 24
您可以通过索引进行分组并应用一个为每个索引组返回一个值的函数.在这里,我采取第一个值:
In [1]: s = Series(range(10), index=[1,2,2,2,5,6,7,7,7,8])
In [2]: s
Out[2]:
1 0
2 1
2 2
2 3
5 4
6 5
7 6
7 7
7 8
8 9
In [3]: s.groupby(s.index).first()
Out[3]:
1 0
2 1
5 4
6 5
7 6
8 9
Run Code Online (Sandbox Code Playgroud)
UPDATE
解决BigBug关于将MultiIndex传递给Series.groupby()时崩溃的评论:
In [1]: s
Out[1]:
STK_ID RPT_Date
600809 20061231 demo
20070331 demo
20070630 demo
20070331 demo
In [2]: s.reset_index().groupby(s.index.names).first()
Out[2]:
0
STK_ID RPT_Date
600809 20061231 demo
20070331 demo
20070630 demo
Run Code Online (Sandbox Code Playgroud)
Ant*_*pov 11
您可以使用duplicated(默认情况下保留第一个值)对数据进行子集化index.使用@ Zelazny7示例:
s = pd.Series(range(10), index=[1,2,2,2,5,6,7,7,7,8])
In [130]: s[~s.index.duplicated()]
Out[130]:
1 0
2 1
5 4
6 5
7 6
8 9
dtype: int64
Run Code Online (Sandbox Code Playgroud)
一种方法是使用drop和index.get_duplicates:
In [43]: df
Out[43]:
String
STK_ID RPT_Date
600809 20061231 demo_string
20070331 demo_string
20070630 demo_string
20070930 demo_string
20071231 demo_string
20060331 demo_string
20060630 demo_string
20060930 demo_string
20061231 demo_string
20070331 demo_string
20070630 demo_string
In [44]: df.drop(df.index.get_duplicates())
Out[44]:
String
STK_ID RPT_Date
600809 20070930 demo_string
20071231 demo_string
20060331 demo_string
20060630 demo_string
20060930 demo_string
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8994 次 |
| 最近记录: |