Rex*_*Rex 9 python numpy dataframe pandas
给定已编制索引的现有Dataframe.
>>> df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>>> df
a b c d e
0 -0.131666 -0.315019 0.306728 -0.642224 -0.294562
1 0.769310 -1.277065 0.735549 -0.900214 -1.826320
2 -1.561325 -0.155571 0.544697 0.275880 -0.451564
3 0.612561 -0.540457 2.390871 -2.699741 0.534807
4 -1.504476 -2.113726 0.785208 -1.037256 -0.292959
5 0.467429 1.327839 -1.666649 1.144189 0.322896
6 -0.306556 1.668364 0.036508 0.596452 0.066755
7 -1.689779 1.469891 -0.068087 -1.113231 0.382235
8 0.028250 -2.145618 0.555973 -0.473131 -0.638056
9 0.633408 -0.791857 0.933033 1.485575 -0.021429
>>> df.set_index("a")
b c d e
a
-0.131666 -0.315019 0.306728 -0.642224 -0.294562
0.769310 -1.277065 0.735549 -0.900214 -1.826320
-1.561325 -0.155571 0.544697 0.275880 -0.451564
0.612561 -0.540457 2.390871 -2.699741 0.534807
-1.504476 -2.113726 0.785208 -1.037256 -0.292959
0.467429 1.327839 -1.666649 1.144189 0.322896
-0.306556 1.668364 0.036508 0.596452 0.066755
-1.689779 1.469891 -0.068087 -1.113231 0.382235
0.028250 -2.145618 0.555973 -0.473131 -0.638056
0.633408 -0.791857 0.933033 1.485575 -0.021429
Run Code Online (Sandbox Code Playgroud)
如何将第3行移动到第一行?
这说,预期结果:
b c d e
a
-1.561325 -0.155571 0.544697 0.275880 -0.451564
-0.131666 -0.315019 0.306728 -0.642224 -0.294562
0.769310 -1.277065 0.735549 -0.900214 -1.826320
0.612561 -0.540457 2.390871 -2.699741 0.534807
-1.504476 -2.113726 0.785208 -1.037256 -0.292959
0.467429 1.327839 -1.666649 1.144189 0.322896
-0.306556 1.668364 0.036508 0.596452 0.066755
-1.689779 1.469891 -0.068087 -1.113231 0.382235
0.028250 -2.145618 0.555973 -0.473131 -0.638056
0.633408 -0.791857 0.933033 1.485575 -0.021429
Run Code Online (Sandbox Code Playgroud)
现在原来的第一行应该成为第二行.
Ale*_*der 13
要将第三行移动到第一行,您可以创建一个将目标行移动到第一个元素的索引。我使用条件列表理解来按列表加入。
然后,只需使用iloc来选择所需的索引行。
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5, 3),columns=['a', 'b', 'c'])
>>> df
a b c
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
2 0.950088 -0.151357 -0.103219
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
target_row = 2
# Move target row to first element of list.
idx = [target_row] + [i for i in range(len(df)) if i != target_row]
>>> df.iloc[idx]
a b c
2 0.950088 -0.151357 -0.103219
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
Run Code Online (Sandbox Code Playgroud)
如果需要,您还可以重置索引。
>>> df.iloc[idx].reset_index(drop=True)
a b c
0 0.950088 -0.151357 -0.103219
1 1.764052 0.400157 0.978738
2 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
Run Code Online (Sandbox Code Playgroud)
或者,您可以使用idx以下命令重新索引列表:
>>> df.reindex(idx)
a b c
2 0.950088 -0.151357 -0.103219
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
Run Code Online (Sandbox Code Playgroud)
小智 5
重新索引可能是在1个明显步骤中将行放入任何新顺序的最佳解决方案,除非它可能需要生成一个可能非常大的新DataFrame.
例如
import pandas as pd
t = pd.read_csv('table.txt',sep='\s+')
t
Out[81]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
2 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
t.index
Out[82]: Int64Index([0, 1, 2, 3], dtype='int64')
t2 = t.reindex([2,0,1,3]) # cannot do this in place
t2
Out[93]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
2 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
0 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
Run Code Online (Sandbox Code Playgroud)
现在可以将索引设置回范围(4)而无需重新索引:
t2.index=range(4)
Out[102]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
1 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
2 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
Run Code Online (Sandbox Code Playgroud)
它也可以通过'元组切换'和行选择作为基本机制来完成,而无需创建新的DataFrame.例如:
import pandas as pd
t = pd.read_csv('table.txt',sep='\s+')
t.ix[1], t.ix[2] = t.ix[2], t.ix[1]
t.ix[0], t.ix[1] = t.ix[1], t.ix[0]
t
Out[96]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
1 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
2 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
Run Code Online (Sandbox Code Playgroud)
另一个in place方法为所需的排序设置DataFrame索引,以便例如第3行获得索引0等,然后DataFrame就地排序.它封装在以下函数中,该函数假设行的索引为正整数m的某个范围(m),并且DataFrame只是索引(没有MultiIndex),如问题中提供的示例所示.
def putfirst(n,df):
if not isinstance(n, int):
print 'error: 1st arg must be an int'
return
if n < 1:
print 'error: 1st arg must be an int > 0'
return
if n == 1:
print 'nothing to do when first arg == 1'
return
if n > len(df):
print 'error: n exceeds the number of rows in the DataFrame'
return
df.index = range(1,n) + [0] + range(n,df.index[-1]+1)
df.sort(inplace=True)
Run Code Online (Sandbox Code Playgroud)
putfirst的参数是n,它是要重新定位到第一行位置的行的序号位置,因此如果要重新定位第3行,则n = 3; 和df是包含要重定位的行的DataFrame.
这是一个演示:
import pandas as pd
df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
df.set_index("a") # ineffective without assignment or inplace=True
Out[182]:
b c d e
a
1.394072 -1.076742 -0.192466 -0.871188 0.420852
-1.211411 -0.258867 -0.581647 -1.260421 0.464575
-1.070241 0.804223 -0.156736 2.010390 -0.887104
-0.977936 -0.267217 0.483338 -0.400333 0.449880
0.399594 -0.151575 -2.557934 0.160807 0.076525
-0.297204 -1.294274 -0.885180 -0.187497 -0.493560
-0.115413 -0.350745 0.044697 -0.897756 0.890874
-1.151185 -2.612303 1.141250 -0.867136 0.383583
-0.437030 0.347489 -1.230179 0.571078 0.060061
-0.225524 1.349726 1.350300 -0.386653 0.865990
df
Out[183]:
a b c d e
0 1.394072 -1.076742 -0.192466 -0.871188 0.420852
1 -1.211411 -0.258867 -0.581647 -1.260421 0.464575
2 -1.070241 0.804223 -0.156736 2.010390 -0.887104
3 -0.977936 -0.267217 0.483338 -0.400333 0.449880
4 0.399594 -0.151575 -2.557934 0.160807 0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745 0.044697 -0.897756 0.890874
7 -1.151185 -2.612303 1.141250 -0.867136 0.383583
8 -0.437030 0.347489 -1.230179 0.571078 0.060061
9 -0.225524 1.349726 1.350300 -0.386653 0.865990
df.index
Out[184]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')
putfirst(3,df)
df
Out[186]:
a b c d e
0 -1.070241 0.804223 -0.156736 2.010390 -0.887104
1 1.394072 -1.076742 -0.192466 -0.871188 0.420852
2 -1.211411 -0.258867 -0.581647 -1.260421 0.464575
3 -0.977936 -0.267217 0.483338 -0.400333 0.449880
4 0.399594 -0.151575 -2.557934 0.160807 0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745 0.044697 -0.897756 0.890874
7 -1.151185 -2.612303 1.141250 -0.867136 0.383583
8 -0.437030 0.347489 -1.230179 0.571078 0.060061
9 -0.225524 1.349726 1.350300 -0.386653 0.865990
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
9691 次 |
| 最近记录: |