小编Tim*_*Tim的帖子

获得多个Pandas DataFrame的平均值

我正在生成许多具有相同形状的数据帧,我想将它们相互比较.我希望能够获得数据帧的均值和中位数.

         Source.0  Source.1  Source.2  Source.3
cluster                                        
0        0.001182  0.184535  0.814230  0.000054
1        0.000001  0.160490  0.839508  0.000001
2        0.000001  0.173829  0.826114  0.000055
3        0.000432  0.180065  0.819502  0.000001
4        0.000152  0.157041  0.842694  0.000113
5        0.000183  0.174142  0.825674  0.000001
6        0.000001  0.151556  0.848405  0.000038
7        0.000771  0.177583  0.821645  0.000001
8        0.000001  0.202059  0.797939  0.000001
9        0.000025  0.189537  0.810410  0.000028
10       0.006142  0.003041  0.493912  0.496905
11       0.003739  0.002367  0.514216  0.479678
12       0.002334  0.001517  0.529041  0.467108
13       0.003458  0.000001  0.532265  0.464276
14       0.000405  0.005655  0.527576 …

Run Code Online (Sandbox Code Playgroud)

python numpy r pandas

Tim*_*Tim

2016 04-19

29
推荐指数

5
解决办法

2万
查看次数

在pandas.DataFrame的对角线上设置值

我有一个pandas数据帧我想将对角线设为0

import numpy
import pandas

df = pandas.DataFrame(numpy.random.rand(5,5))
df

Out[6]:
     0           1           2           3               4
0    0.536596    0.674319    0.032815    0.908086    0.215334
1    0.735022    0.954506    0.889162    0.711610    0.415118
2    0.119985    0.979056    0.901891    0.687829    0.947549
3    0.186921    0.899178    0.296294    0.521104    0.638924
4    0.354053    0.060022    0.275224    0.635054    0.075738
5 rows × 5 columns

Run Code Online (Sandbox Code Playgroud)

现在我想将对角线设置为0:

for i in range(len(df.index)):
    for j in range(len(df.columns)):
        if i==j:
            df.loc[i,j] = 0
df
Out[9]:
     0           1           2           3           4
0    0.000000    0.674319    0.032815    0.908086    0.215334
1    0.735022    0.000000 …

Run Code Online (Sandbox Code Playgroud)

python numpy pandas

Tim*_*Tim

lucky-day

24
推荐指数

3
解决办法

1万
查看次数

为什么cffi比numpy快得多？

我一直在玩python中编写cffi模块,他们的速度让我想知道我是否正确使用标准python.这让我想彻底切换到C!说实话,有一些伟大的python库我永远无法在C中重新实现自己,所以这比任何事情都更加假设.

这个例子显示了python中的sum函数与numpy数组一起使用,以及与ac函数相比有多慢.是否有更快的pythonic方法来计算numpy数组的总和？

def cast_matrix(matrix, ffi):
    ap = ffi.new("double* [%d]" % (matrix.shape[0]))
    ptr = ffi.cast("double *", matrix.ctypes.data)
    for i in range(matrix.shape[0]):
        ap[i] = ptr + i*matrix.shape[1]                                                                
    return ap 

ffi = FFI()
ffi.cdef("""
double sum(double**, int, int);
""")
C = ffi.verify("""
double sum(double** matrix,int x, int y){
    int i, j; 
    double sum = 0.0;
    for (i=0; i<x; i++){
        for (j=0; j<y; j++){
            sum = sum + matrix[i][j];
        }
    }
    return(sum);
}
""")
m = np.ones(shape=(10,10))
print 'numpy says', m.sum()

m_p = cast_matrix(m, …

Run Code Online (Sandbox Code Playgroud)

c python pypy numpy python-cffi

Tim*_*Tim

2014 05-10

13
推荐指数

1
解决办法

2814
查看次数

Python Multiprocessing.Process 如何重用进程？

我正在使用 python 多处理模块并行运行一些长时间运行的任务。我正在使用 start() 方法来运行作业，但是一旦作业返回，我想再次运行它们。

\n\n

是否可以重用我创建的流程？或者每次我想运行作业时都必须创建一个新的 Process 对象？

\n\n

pyhton 文档中有这一部分建议我不能多次使用 start() 方法，但也许有人知道重用实例的另一种方法：

\n\n

开始（）

\n\n

启动 process\xe2\x80\x99s 活动。

\n\n

每个进程对象最多只能调用一次。\n它安排在单独的进程中调用对象\xe2\x80\x99s run() 方法。

\n\n

这是我的 Process 类版本：

\n\n

class Process(multiprocessing.Process):\n    def __init__(self, result_queue, MCMCinstance):\n        assert isinstance(MCMCinstance, MCMC)\n        multiprocessing.Process.__init__(self)\n        self.result_queue = result_queue\n        self.mcmc = MCMCinstance\n        self.interface = C_interface(self.mcmc)\n        self.burn_in = False\n\n    def run(self):\n        if self.burn_in: interface.burn_in()\n        self.interface.sample(self.mcmc.options.runs)\n        self.interface.update(self.mcmc)\n        self.result_queue.put(self.mcmc)\n

Run Code Online (Sandbox Code Playgroud)\n\n

然后我实例化进程并使用 start() 方法运行它们：

\n\n

# setup the jobs and run\nresult_queue = multiprocessing.Queue()\n\nmcmc1 = MCMC(options, donors, clusters)\nmcmc2 = MCMC(options, donors, clusters)\nmcmc3 …

Run Code Online (Sandbox Code Playgroud)

python multithreading multiprocessing

Tim*_*Tim

lucky-day

6
推荐指数

1
解决办法

7113
查看次数

Python Scrapy教程KeyError:'未找到蜘蛛:

我正在尝试编写我的第一个scrapy蜘蛛,我一直在关注http://doc.scrapy.org/en/latest/intro/tutorial.html上的教程但我得到一个错误"KeyError:'Spider not found :"

我想我正在从正确的目录运行命令(带有scrapy.cfg文件的目录)

(proscraper)#( 10/14/14@ 2:06pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   tree
.
??? scrapy
?   ??? __init__.py
?   ??? items.py
?   ??? pipelines.py
?   ??? settings.py
?   ??? spiders
?       ??? __init__.py
?       ??? juno_spider.py
??? scrapy.cfg

2 directories, 7 files
(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   ls
scrapy  scrapy.cfg

Run Code Online (Sandbox Code Playgroud)

这是我得到的错误

(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   scrapy crawl juno
/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/twisted/internet/_sslverify.py:184: UserWarning: You do not have the service_identity module installed. Please install it from <https://pypi.python.org/pypi/service_identity>. Without the …

Run Code Online (Sandbox Code Playgroud)

python scrapy

Tim*_*Tim

2014 10-28

5
推荐指数

1
解决办法

7348
查看次数

从matplotlib pcolor中删除边框

如何删除此绘图顶部和右侧的白色边框？

这是我用来绘制我的pandas DataFrame的代码:

plt.pcolor(diff,clip_on=False) # diff is a DataFrame
plt.yticks(np.arange(0.5, len(diff.index), 1), diff.index)
plt.xticks(np.arange(0.5, len(diff.columns), 1), diff.columns, rotation=90)
plt.colorbar()

Run Code Online (Sandbox Code Playgroud)

pcolor情节

python matplotlib

Tim*_*Tim

2014 02-12

4
推荐指数

1
解决办法

1382
查看次数

获得所有排列而无需替换？

我希望获得列表的所有排列,但无需重复,无论是否排序.这很难描述,所以我举一个例子.我真的很想知道这个操作的名称,因为我一直都在使用它.另外一个在python中实现这个的简单方法真的可以帮到我.谢谢!

例如

['foo', 'bar', 'la']

==>

['foo', 'bar']
['foo', 'la']
['ba', 'la']

Run Code Online (Sandbox Code Playgroud)

python computer-science permutation combinatorics

Tim*_*Tim

lucky-day

2
推荐指数

1
解决办法

2387
查看次数

从matplotlib heatplot中删除空格

我在matplotlib中有一个热图,我想删除图中北部和东部的空白,如下图所示. 在此输入图像描述

这是我用来生成图的代码:

# plotting
figsize=(50,20)
y,x = 1,2
fig, axarry = plt.subplots(y,x, figsize=figsize)
p = axarry[1].pcolormesh(copy_matrix.values)

# put the major ticks at the middle of each cell
axarry[1].set_xticks(np.arange(copy_matrix.shape[1])+0.5, minor=False)
axarry[1].set_yticks(np.arange(copy_matrix.shape[0])+0.5, minor=False)

axarry[1].set_title(file_name, fontweight='bold')

axarry[1].set_xticklabels(copy_matrix.columns, rotation=90)
axarry[1].set_yticklabels(copy_matrix.index)

fig.colorbar(p, ax=axarry[1])

Phylo.draw(tree, axes=axarry[0])

Run Code Online (Sandbox Code Playgroud)

python numpy matplotlib

Tim*_*Tim

2014 06-19

2
推荐指数

1
解决办法

2097
查看次数