小编sve*_*esh的帖子

Python:带有pandas的加权中值算法

我有一个如下所示的数据框:

Out[14]:
    impwealth  indweight
16     180000     34.200
21     384000     37.800
26     342000     39.715
30    1154000     44.375
31     421300     44.375
32    1210000     45.295
33    1062500     45.295
34    1878000     46.653
35     876000     46.653
36     925000     53.476

Run Code Online (Sandbox Code Playgroud)

我想impwealth用频率权重计算列的加权中位数indweight.我的伪代码看起来像这样:

# Sort `impwealth` in ascending order 
df.sort('impwealth', 'inplace'=True)

# Find the 50th percentile weight, P
P = df['indweight'].sum() * (.5)

# Search for the first occurrence of `impweight` that is greater than P 
i = df.loc[df['indweight'] > P, 'indweight'].last_valid_index()

# The …

Run Code Online (Sandbox Code Playgroud)

python algorithm pandas

sve*_*esh

2014 09-30

13
推荐指数

4
解决办法

6833
查看次数

从Python运行Stata do文件

我有一个Python脚本清理并在大型面板数据集(2,000,000+ observations)上执行基本统计计算.

我发现其中一些任务更适合Stata,并用必要的命令写了一个do文件.因此,我想在我的Python代码中运行.do文件.我该如何调用.do文件Python？

python stata

sve*_*esh

2015 11-26

8
推荐指数

2
解决办法

7694
查看次数

大熊猫：切片具有多个索引的Multindex

我有一个d关于100,000,000行和3列的数据框。看起来像这样：

import pandas as pd 

In [17]: d = pd.DataFrame({'id': ['a', 'b', 'c', 'd', 'e'], 'val': [1, 2, 3, 4, 5], 'n': [34, 22, 95, 86, 44]}) 

In [18]: d.set_index(['id', 'val'], inplace = True)

Run Code Online (Sandbox Code Playgroud)

我还有另一个要保留的数据框，其值是id和。有60万左右的组合，和我想保留：valdidval

In [20]: keep = pd.DataFrame({'id':['a', 'b'], 'val' : [1, 2]})

Run Code Online (Sandbox Code Playgroud)

我已经通过以下方式尝试过：

In [21]: keep.set_index(['id', 'val'], inplace = True)

In [22]: d.loc[d.index.isin(keep.index), :] 
Out [22]:         
                   n
         id val    
          a  1    34
          b  2    22 …

Run Code Online (Sandbox Code Playgroud)

python indexing slice pandas

sve*_*esh

lucky-day

4
推荐指数

2
解决办法

52
查看次数

Pandas选择一致的更改而不进行迭代

我有一个如下所示的数据框:

In [9]: d = pd.DataFrame({'place': ['home', 'home', 'home', 'home', 'office', 'office', 'office', 'home', 'office', 'home', 'office', 'home', 'office', 'home'], 'person': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c', 'c'], 'other_stuff': ['f', 'g', 'd', 'q', 'w', 'r', 's', 't', 'u', 'v', 'w', 'l', 'm', 'n']})



In [7]: d
      place  other_stuff person
 0     home           f      a
 1     home           g      a
 2     home           d      a
 3     home           q      a
 4   office           w      a
 5   office           r      a …

Run Code Online (Sandbox Code Playgroud)

python select pandas pandas-groupby

sve*_*esh

2018 06-01

1
推荐指数

1
解决办法

51
查看次数