小编Foo*_*Bar的帖子

如何删除数据框中的所有行？

我想删除数据框中的所有行.

我想这样做的原因是我可以用迭代循环重建数据帧.我想从一个完全空的数据帧开始.

或者,如果可能的话,我可以仅从列/类型信息创建一个空df

python pandas

cam*_*mil

2014 07-07

8
推荐指数

5
解决办法

1万
查看次数

Statsmodels:计算拟合值和R平方

我正在运行如下的回归(df是一个pandas数据帧):

import statsmodels.api as sm
est = sm.OLS(df['p'], df[['e', 'varA', 'meanM', 'varM', 'covAM']]).fit()
est.summary()

Run Code Online (Sandbox Code Playgroud)

除其他外,这给了我一个R平方0.942.那么我想绘制原始y-values值和拟合值.为此,我对原始值进行了排序:

orig = df['p'].values
fitted = est.fittedvalues.values
args = np.argsort(orig)
import matplotlib.pyplot as plt
plt.plot(orig[args], 'bo')
plt.plot(orig[args]-resid[args], 'ro')
plt.show()

Run Code Online (Sandbox Code Playgroud)

然而,这给了我一个图表,其中值完全关闭.什么都没有表明R平方0.9.因此,我试图自己手动计算:

yBar = df['p'].mean()
SSTot = df['p'].apply(lambda x: (x-yBar)**2).sum()
SSReg = ((est.fittedvalues - yBar)**2).sum()  
1 - SSReg/SSTot
Out[79]: 0.2618159806908984

Run Code Online (Sandbox Code Playgroud)

难道我做错了什么？或者,为什么我的计算与statsmodels得到的结果相差甚远？SSTot,SSReg有价值48084,35495.

python numpy statsmodels

Foo*_*Bar

lucky-day

8
推荐指数

2
解决办法

8047
查看次数

Statsmodels版本0.6.1不包括tsa？

我正在尝试使用statsmodels(sm)使HP过滤器工作.

这里的文档暗示模块sm.tsa已经存在0.6.1,但是我收到以下错误:

>>> import statsmodels as sm
>>> sm.__version__
'0.6.1'
>>> sm.tsa.filters.hp_filter.hpfilter()
Traceback (most recent call last):
  File "<input>", line 1, in <module>
AttributeError: 'module' object has no attribute 'tsa'
>>> sm.tsa
Traceback (most recent call last):
  File "<input>", line 1, in <module>
AttributeError: 'module' object has no attribute 'tsa'

Run Code Online (Sandbox Code Playgroud)

这是我的pip输出:

nat-oitwireless-inside-vapornet100-a-14423:prog2 foobar$ pip show statsmodels
---
Name: statsmodels
Version: 0.6.1
Location: /usr/local/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.9-x86_64.egg
Requires:

Run Code Online (Sandbox Code Playgroud)

python statsmodels

Foo*_*Bar

lucky-day

8
推荐指数

1
解决办法

4178
查看次数

当to_latex时,Pandas用任意数字替换NAN

我有一个大的多索引多列数据框df,我没有在这里显示.我生成一个像这样的索引片:

subDf = df.sort_index(level=0).loc[:'e']

Run Code Online (Sandbox Code Playgroud)

然后该片包含NaN在索引的第二级:

>>> subDf.iloc[0:1]
                  change
robustness value        
baseline   NaN     -14.5

Run Code Online (Sandbox Code Playgroud)

生成的csv to_csv()似乎是正确的:

>>> subDf.iloc[0:1].to_csv()
Out[15]: 'robustness,value,change\nbaseline,,-14.5\n'

Run Code Online (Sandbox Code Playgroud)

同样,to_html()功能就像被开除一样.但是,当我尝试获取latex_output时,NaN消失并50.00出现:

>>> subDf.iloc[0:1].to_latex()
Out[14]: u'\\begin{tabular}{llr}\n\\toprule\n                &       &  change \\\\\nrobustness & value &         \\\\\n\\midrule\nbaseline & 50.00 &   -14.5 \\\\\n\\bottomrule\n\\end{tabular}\n'

Run Code Online (Sandbox Code Playgroud)

的50.00是不完全任意数,它是在原始数据帧中的多指数的第二层的最后一个值:

>>> df.index
Out[18]: 
MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'baseline', u'f'], [0.01, 0.04, 0.25, 0.75, 0.86, 0.99, 1.0, 2.0, 4.0, 10.0, 50.0]],
           labels=[[5, 6, 6, 2, 2, 1, 3, 3, …

Run Code Online (Sandbox Code Playgroud)

python nan pandas

Foo*_*Bar

2018 03-12

8
推荐指数

1
解决办法

275
查看次数

PyCharm和ipython的组合无法导入qt5或Qt5Agg

我已经安装了基本的os和Pycharm以及整个python堆栈conda,现在有麻烦在ipythonsesssion中启动交互式matplotlib .

这是pycharm的ipython会话:

/home/foo/.conda/envs/myenv3/bin/python3.5 /opt/pycharm-2016.2.3/helpers/pydev/pydevconsole.py 41070 33134
Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06) 
Type "copyright", "credits" or "license" for more information.

IPython 5.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
PyDev console: using IPython 5.0.0

import sys; print('Python %s on %s' % (sys.version, sys.platform))

Python 3.5.2 …

Run Code Online (Sandbox Code Playgroud)

python pyqt matplotlib ipython pycharm

Foo*_*Bar

2016 10-09

8
推荐指数

1
解决办法

1912
查看次数

如何获得'numpy.array'的边界？

如果我有d尺寸np.array,我怎样才能得到边界的标记？

例如,在2d,

test = np.arange(16).reshape((4, 4))
test
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

Run Code Online (Sandbox Code Playgroud)

现在我想获得界限

array([[ True,  True,   True,  True],
       [ True,  False,  False, True],
       [ True,  False,  False, True],
       [ True,  True,   True,  True]])

Run Code Online (Sandbox Code Playgroud)

如果效率很高并适用于任意数量的维度,但它必须至少工作3.数组不一定是超立方体,但可能是超立方体:所有维度中的网格点数量不一定相同,不同于这个例子.

对于形状数组(4, 5, 6),预期输出为

array([[[ True,  True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True,  True],
        [ True, …

Run Code Online (Sandbox Code Playgroud)

python numpy

Foo*_*Bar

2018 01-04

8
推荐指数

1
解决办法

620
查看次数

dtype:integer,但loc返回float

我有一个奇怪的数据集:

   year   firms  age  survival
0  1977  564918    0       NaN
2  1978  503991    0       NaN
3  1978  413130    1  0.731310
5  1979  497805    0       NaN
6  1979  390352    1  0.774522

Run Code Online (Sandbox Code Playgroud)

我将dtype前三列的整数转换为整数:

>>> df.dtypes
year          int64
firms         int64
age           int64
survival    float64

Run Code Online (Sandbox Code Playgroud)

但现在我想在这里根据索引搜索另一个表:

idx = 331
otherDf.loc[df.loc[idx, 'age']]
Traceback (most recent call last):
(...)
KeyError: 8.0

Run Code Online (Sandbox Code Playgroud)

这来自

df.loc[idx, 'age']
8.0

Run Code Online (Sandbox Code Playgroud)

为什么这会继续返回浮点值？我怎样才能执行查找otherDf？我是熊猫版0.15.

python types dataframe pandas

Foo*_*Bar

2017 01-02

7
推荐指数

1
解决办法

2394
查看次数

使用Scipy记录正态随机变量

我无法理解创建对数正态变量的基础知识,如此处所述.

对数正态分布采用均值和方差作为参数.我想使用这些参数创建一个冻结分布,然后得到cdf,pdf等.

但是,在文档中,他们使用了冻结分发

from scipy.stats import lognorm
s = 0.953682269606
rv = lognorm(s)

Run Code Online (Sandbox Code Playgroud)

's'似乎是标准偏差.我尝试使用'loc'和'scale'参数而不是's',但这会产生错误(s是必需参数).如何为位置和比例生成参数值为"m","s"的冻结分布？

python scipy

Foo*_*Bar

2015 02-25

7
推荐指数

1
解决办法

1633
查看次数

高效的2d cumsum

说我有这样的数组

>>> a = np.arange(1,8).reshape((1,-1))
>>> a
array([[1, 2, 3, 4, 5, 6, 7]])

Run Code Online (Sandbox Code Playgroud)

并且我想为每个项目创建a一个"下4个项目的cumsum".也就是说,我的预期输出是

1,       2,      3, 4, 5, 6, 7, 8
1+2,     2+3,     ...
1+2+3    2+3+4    ...
1+2+3+4  2+3+4+5  ...

Run Code Online (Sandbox Code Playgroud)

即包含的矩阵

1, 2, 3, 4, 5, 0, 0, 0
3, 5, 7, 9, 11,0, 0, 0
6, 9, 12,15,18,0, 0, 0
10,14,18,21,26,0, 0, 0

Run Code Online (Sandbox Code Playgroud)

由于最后3个项目的cumsum操作无法正确完成,我期待0那里.我知道如何做一个单一的cumsum.实际上,阵列是

a[:4].cumsum().reshape((-1,1)); a[1:5].cumsum().reshape((-1,1))...

Run Code Online (Sandbox Code Playgroud)

水平堆放.但是,我不知道如何以有效的方式做到这一点.这样做的好的矢量化numpy方式是什么？我也对scipy包装开放,只要它们numpy在效率或可读性方面占主导地位.

python arrays numpy scipy cumsum

Foo*_*Bar

2017 01-08

7
推荐指数

1
解决办法

767
查看次数

如何使用 multiprocessing 和 pool.map 跟踪状态？

我是第一次设置多处理模块，基本上，我打算做一些类似的事情

from multiprocessing import pool
pool = Pool(processes=102)
results = pool.map(whateverFunction, myIterable)
print 1

Run Code Online (Sandbox Code Playgroud)

据我所知，1将在所有过程都返回并且结果完成后立即打印。我想对这些进行一些状态更新。实现它的最佳方法是什么？

我有点犹豫要不要whateverFunction()打印。特别是如果有大约 200 个值，我将打印 200 次类似 'process done' 这样的东西，这不是很有用。

我希望输出像

10% of myIterable done
20% of myIterable done

Run Code Online (Sandbox Code Playgroud)

python multiprocessing

Foo*_*Bar

2016 01-16

7
推荐指数

1
解决办法

4969
查看次数

标签统计

python ×10

numpy ×3

pandas ×3

scipy ×2

statsmodels ×2

arrays ×1

cumsum ×1

dataframe ×1

ipython ×1

matplotlib ×1

multiprocessing ×1

nan ×1

pycharm ×1

pyqt ×1

types ×1

标签 统计

小编Foo_Bar的帖子

标签统计