kal*_*alu 4 python numpy pandas
要查看此问题,请考虑以下数据框
In [66]: dat = pandas.DataFrame(['a','b','c','d','e','f','g','h'],
columns=['letters'])
In [67]: dat['numbers'] = pandas.Series([1,2,3,4,5,6,7,8])
In [68]: dat['names'] = pandas.Series(['jim','jan','jerry','george'
,'mary','mary','sue','sue'])
In [69]: dat
Out[69]:
letters numbers names
0 a 1 jim
1 b 2 jan
2 c 3 jerry
3 d 4 george
4 e 5 mary
5 f 6 mary
6 g 7 sue
7 h 8 sue
Run Code Online (Sandbox Code Playgroud)
按名称分组
In [78]: dat = dat.groupby(['names'])[['letters']]
Run Code Online (Sandbox Code Playgroud)
现在我尝试连续写字会产生一个有趣的结果:
In [80]: dat.apply(lambda x: '|'.join(set(x)))
Out[80]:
names
george letters|numbers|names
jan letters|numbers|names
jerry letters|numbers|names
jim letters|numbers|names
mary letters|numbers|names
sue letters|numbers|names
dtype: object
Run Code Online (Sandbox Code Playgroud)
以下黑客似乎有效,但为什么我需要再次选择"字母",为什么上面的输出看起来像呢?
In [84]: dat.apply(lambda x: '|'.join(set(x['letters'])))
Out[84]:
names
george d
jan b
jerry c
jim a
mary e|f
sue h|g
dtype: object
Run Code Online (Sandbox Code Playgroud)
这可能是个错误吗?
安装版本
commit:无python:2.7.5.final.0 python-bits:64 OS:Darwin OS-release:13.1.0 machine:x86_64 processor:i386 byteorder:little LC_ALL:None LANG:en_US.UTF-8
pandas:0.13.1 Cython:0.20.1 numpy:1.6.2 scipy:0.11.0 statsmodels:0.5.0 IPython:2.0.0 sphinx:1.2.2 patsy:0.2.1 scikits.timeseries:None dateutil:1.5 pytz: 2012d瓶颈:无表:无numexpr:无matplotlib:1.1.1 openpyxl:无xlrd:无xlwt:无xlsxwriter:无sqlalchemy:无lxml:3.3.5 bs4:4.3.2 html5lib:无bq:无apiclient:无
这看起来有点奇怪,但正如你看到的一样,DataFrame是它的列:
In [11]: dat
Out[11]:
letters numbers names
0 a 1 jim
1 b 2 jan
2 c 3 jerry
3 d 4 george
4 e 5 mary
5 f 6 mary
6 g 7 sue
7 h 8 sue
[8 rows x 3 columns]
In [12]: set(dat)
Out[12]: {'letters', 'names', 'numbers'}
Run Code Online (Sandbox Code Playgroud)
这是由于您遍历DataFrame(通过列)的方式:
In [13]: for i in dat: print(i)
letters
numbers
names
Run Code Online (Sandbox Code Playgroud)
这适用于SeriesGroupBy(迭代通过一系列迭代其元素):
In [21]: g = dat.groupby(['names'])['letters']
In [22]: g.apply(lambda x: '|'.join(set(x)))
Out[22]:
names
george d
jan b
jerry c
jim a
mary e|f
sue h|g
dtype: object
Run Code Online (Sandbox Code Playgroud)
注意:您不需要设置或实际上是lambda:
In [23]: g.apply('|'.join)
Out[23]:
names
george d
jan b
jerry c
jim a
mary e|f
sue g|h
dtype: object
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
68 次 |
| 最近记录: |