小编egu*_*aio的帖子

打包仅使用Cython编译的python库的二进制编译.so文件

我有一个名为mypackinside 的包,里面有一个模块mymod.py,而且__init__.py.出于某种原因,我不需要辩论,我需要打包这个模块(也不允许使用.py或.pyc文件).也就是说,它__init__.py是分布式压缩文件中允许的唯一源文件.

文件夹结构是:

. 
?  
??? mypack
?   ??? __init__.py
?   ??? mymod.py
??? setup.py

Run Code Online (Sandbox Code Playgroud)

我发现Cython能够通过转换.so库中的每个.py文件来实现这一点,该文件可以直接用python导入.

问题是:setup.py文件必须如何才能轻松打包和安装？

目标系统有一个virtualenv,其中必须使用允许轻松安装和卸载的任何方法安装软件包(easy_install,pip等都是受欢迎的).

我尝试了所有触手可及的东西.我阅读setuptools和distutils文档,计算器所有相关的问题,并试图与所有类型的命令(sdist,bdist,bdist_egg等),有很多setup.cfg和MANIFEST.in文件条目的组合.

我得到的最接近的是下面的安装文件,它将子类化bdist_egg命令以删除.pyc文件,但这会破坏安装.

如果覆盖了正确安装中包含的所有辅助文件(我需要pip freeze在venv中运行并查看mymod==0.0.1),那么在venv中 "手动"安装文件的解决方案也很好.

运行它:

python setup.py bdist_egg --exclude-source-files

Run Code Online (Sandbox Code Playgroud)

和(尝试)安装它

easy_install mymod-0.0.1-py2.7-linux-x86_64.egg

Run Code Online (Sandbox Code Playgroud)

您可能会注意到,目标是使用python 2.7的linux 64位.

from Cython.Distutils import build_ext
from setuptools import setup, find_packages
from setuptools.extension import Extension
from setuptools.command import bdist_egg
from setuptools.command.bdist_egg import  walk_egg, log 
import os

class my_bdist_egg(bdist_egg.bdist_egg):

    def zap_pyfiles(self): …

Run Code Online (Sandbox Code Playgroud)

python distutils setuptools cython setup.py

egu*_*aio

2018 03-10

11
推荐指数

2
解决办法

5809
查看次数

CherryPy:如何在数据更新时停止和缓冲传入的请求

我正在一个服务器中使用cherrypy来实现类似REST的API.响应意味着一些繁重的计算需要大约2秒的请求.为了进行这种计算,使用了一些每天更新三次的数据.

数据在后台更新(大约需要半小时),一旦更新,新数据的引用将传递给响应请求的函数.这只需要一个毫秒.

我需要的是确保使用旧数据或新数据回答每个请求,但在更改数据引用时不会发生任何请求处理.理想情况下,我希望找到一种在更改数据引用时缓冲传入请求的方法,并确保在所有进程内请求完成后更改引用.

我当前(非)工作的最小例子如下:

import time
import cherrypy
from cherrypy.process import plugins

theData = 0

def processData():
    """Backround task works for half hour three times a day, 
        and when finishes it publish it in the engine buffer."""
    global theData # using global variables to simplify the example
    theData += 1
    cherrypy.engine.publish("doChangeData", theData)

class DataPublisher(object):

    def __init__(self):
        self.data = 'initData'
        cherrypy.engine.subscribe('doChangeData', self.changeData)

    def changeData(self, newData):
        cherrypy.engine.log("Changing data, buffering should start!")
        self.data = newData
        time.sleep(1) #exageration of the 1 milisec of  the …

Run Code Online (Sandbox Code Playgroud)

python cherrypy

egu*_*aio

2017 05-23

5
推荐指数

1
解决办法

883
查看次数

可以使用shapely和rtree在大数据集上找到最接近每个点的直线

我有一个简化的城市地图，其中的街道为线串，地址为点。我需要找到从每个点到任何一条街线的最近路径。我有一个执行此操作的脚本，但由于嵌套了循环，因此它在多项式时间内运行。对于15万行（形状为LineString）和10000点（形状为Point），在8 GB Ram计算机上需要10个小时才能完成。

该函数如下所示（抱歉，无法完全重现）：

import pandas as pd
import shapely
from shapely import Point, LineString

def connect_nodes_to_closest_edges(edges_df , nodes_df,
                                   edges_geom,
                                   nodes_geom):
    """Finds closest line to points and returns 2 dataframes:
        edges_df
        nodes_df
    """
    for i in range(len(nodes_df)):
        point = nodes_df.loc[i,nodes_geom]
        shortest_distance = 100000
        for j in range(len(edges_df)):
            line = edges_df.loc[j,edges_geom]
            if line.distance(point) < shortest_distance:
                shortest_distance = line.distance(point)
                closest_street_index = j
                closest_line = line
                ...

Run Code Online (Sandbox Code Playgroud)

然后，将结果保存在表中作为新列，该列将点到线的最短路径添加为新列。

有没有一种方法可以使该功能更快些？

例如，如果我可以为50m左右的每个点过滤出线，这将有助于加快每次迭代的速度？

有没有一种方法可以使用rtree包使其更快？我能够找到一个答案，从而使脚本可以更快地找到多边形的交点，但是我似乎无法使它适用于最接近点到线的地方。

多边形相交的更快方法

https://pypi.python.org/pypi/Rtree/

抱歉，如果已经回答了，但是我在这里也没有在gis.stackexchange上找到答案

谢谢你的建议！

python gis r-tree pandas shapely

Ste*_*anK

2017 09-21