小编Sya*_*man的帖子

追溯性地将新的ManyToManyField默认值设置为现有模型

我有一个Django模型(称为BiomSearchJob),它目前正在运行,我想添加一个新的多对多关系,以使系统更易于为用户定制.以前,用户可以在不指定一组TaxonomyLevelChoices但不向系统添加更多功能的情况下提交作业,用户现在应该可以选择自己的分类级别.

这是模型:

class TaxonomyLevelChoice(models.Model):
    taxon_level = models.CharField(
        verbose_name="Taxonomy Chart Level", max_length=60)
    taxon_level_proper_name = models.CharField(max_length=60)

    def __unicode__(self):
        return self.taxon_level_proper_name

class BiomSearchJob(models.Model):
    ...
    # The new many-to-many relation
    taxonomy_levels = models.ManyToManyField(
        'TaxonomyLevelChoice', blank=False, max_length=3,
        default=["phylum", "class", "genus"])

    name = models.CharField(
        null=False, blank=False, max_length=100, default="Unnamed Job",
        validators=[alphanumeric_spaces])
    ...

Run Code Online (Sandbox Code Playgroud)

目前,所有现有的BiomSearchJobs隐含地具有default=术语中列出的三个分类级别(不是用户可选择的),因此在数据库中都是相同的.运行后migrate,我发现以前的作业不会立即具有三个分类级别关系,它们只在调用时返回一个空集job.taxonomy_levels.all()(如果job是实例BiomSearchJob).

有没有办法追溯性地添加这种关系而无需手动浏览所有内容？理想情况下,运行一个命令migrate,我想现有的BiomSearchJobs拥有phylum,class以及genus在上市taxonomy_levels属性.

python django model

Sya*_*man

lucky-day

6
推荐指数

1
解决办法

389
查看次数

Haskell将Int转换为Float

我对其中一个新功能有一些问题,它是fromIntegral函数.

基本上我需要接受两个Int参数并返回数字的百分比,但是当我运行我的代码时,它一直给我这个错误:

码:

percent :: Int -> Int -> Float
percent x y =   100 * ( a `div` b )
where   a = fromIntegral x :: Float
        b = fromIntegral y :: Float

Run Code Online (Sandbox Code Playgroud)

错误:

No instance for (Integral Float)
arising from a use of `div'
Possible fix: add an instance declaration for (Integral Float)
In the second argument of `(*)', namely `(a `div` b)'
In the expression: 100 * (a `div` b)
In an equation for `percent':
    percent x …

Run Code Online (Sandbox Code Playgroud)

floating-point int haskell integral

Sya*_*man

2012 05-15

5
推荐指数

1
解决办法

1万
查看次数

启动lua脚本进行多次点击和访问

我正在尝试抓取Google Scholar搜索结果，并获取与搜索匹配的每个结果的所有BiBTeX格式。现在，我有一个带有Splash的Scrapy爬虫。我有一个lua脚本，它将在获取hrefBibTeX格式的引用之前单击“引用”链接并加载模式窗口。但是看到有多个搜索结果，因此有多个“引用”链接，我需要全部单击它们并加载各个BibTeX页面。

这是我所拥有的：

import scrapy
from scrapy_splash import SplashRequest


class CiteSpider(scrapy.Spider):
    name = "cite"
    allowed_domains = ["scholar.google.com", "scholar.google.ae"]
    start_urls = [
        'https://scholar.google.ae/scholar?q="thermodynamics"&hl=en'
    ]

    script = """
        function main(splash)
          local url = splash.args.url
          assert(splash:go(url))
          assert(splash:wait(0.5))
          splash:runjs('document.querySelectorAll("a.gs_nph[aria-controls=gs_cit]")[0].click()')
          splash:wait(3)
          local href = splash:evaljs('document.querySelectorAll(".gs_citi")[0].href')
          assert(splash:go(href))
          return {
            html = splash:html(),
            png = splash:png(),
            href=href,
          }
        end
        """

    def parse(self, response):
        yield SplashRequest(self.start_urls[0], self.parse_bib,
                            endpoint="execute",
                            args={"lua_source": self.script})

    def parse_bib(self, response):
        filename = response.url.split("/")[-2] + '.html'
        with open(filename, 'wb') as f:
            f.write(response.css("body …

Run Code Online (Sandbox Code Playgroud)

python scrapy scrapy-splash splash-js-render

Sya*_*man

2019 11-20

5
推荐指数

1
解决办法

2289
查看次数

在numpy的向量化的矩阵曼哈顿距离

我正在尝试实现一个有效的矢量化numpy来制作曼哈顿距离矩阵.我熟悉用于使用点积创建高效欧几里德距离矩阵的构造,如下所示:

A = [[1, 2]   
     [2, 1]]

B = [[1, 1],
     [2, 2],
     [1, 3],
     [1, 4]]

def euclidean_distmtx(X, X):
    f = -2 * np.dot(X, Y.T)
    xsq = np.power(X, 2).sum(axis=1).reshape((-1, 1))
    ysq = np.power(Y, 2).sum(axis=1)
    return np.sqrt(xsq + f + ysq)

Run Code Online (Sandbox Code Playgroud)

我想实现类似的东西,但使用曼哈顿距离代替.到目前为止,我已经接近但是试图重新安排绝对差异.据我了解,曼哈顿的距离是

$\ sum_i | x_i - y_i | = | x_1 - y_1 | + | x_2 - y_2 | + ...$

我试图通过考虑绝对函数是否完全不适用于解决这个问题来给我这个等价

$\ sum_i x_i - y_i =\sum_i x_i - \sum_i y_i$

这给了我以下矢量化

def manhattan_distmtx(X, Y):
    f = np.dot(X.sum(axis=1).reshape(-1, 1), Y.sum(axis=1).reshape(-1, 1).T)
    return f / Y.sum(axis=1) - Y.sum(axis=1)

Run Code Online (Sandbox Code Playgroud)

我认为我是正确的轨道,但我不能移动值而不删除每个向量元素之间的差异的绝对函数.我确信在绝对值周围有一个聪明的伎俩,可能是通过使用np.sqrt平方值或其他东西,但我似乎无法实现它.

python numpy vectorization

Sya*_*man

lucky-day

4
推荐指数

1
解决办法

7389
查看次数

使用另一列中的值对 str.starts_with() 进行极坐标分析

例如，我有一个极坐标数据框：

>>> df = pl.DataFrame({\'A\': [\'a\', \'b\', \'c\', \'d\'], \'B\': [\'app\', \'nop\', \'cap\', \'tab\']})\n>>> df\nshape: (4, 2)\n\xe2\x94\x8c\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\xac\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x90\n\xe2\x94\x82 A   \xe2\x94\x86 B   \xe2\x94\x82\n\xe2\x94\x82 --- \xe2\x94\x86 --- \xe2\x94\x82\n\xe2\x94\x82 str \xe2\x94\x86 str \xe2\x94\x82\n\xe2\x95\x9e\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\xaa\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\xa1\n\xe2\x94\x82 a   \xe2\x94\x86 app \xe2\x94\x82\n\xe2\x94\x82 b   \xe2\x94\x86 nop \xe2\x94\x82\n\xe2\x94\x82 c   \xe2\x94\x86 cap \xe2\x94\x82\n\xe2\x94\x82 d   \xe2\x94\x86 tab \xe2\x94\x82\n\xe2\x94\x94\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\xb4\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x98\n

Run Code Online (Sandbox Code Playgroud)\n

我正在尝试获取第三列，C即True列中的字符串是否以同一行的B列中的字符串开头，否则。所以在上面的例子中，我期望：AFalse

\xe2\x94\x8c\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\xac\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\xac\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x90\n\xe2\x94\x82 A   \xe2\x94\x86 B   \xe2\x94\x86 C     \xe2\x94\x82\n\xe2\x94\x82 --- \xe2\x94\x86 --- \xe2\x94\x86 ---   \xe2\x94\x82\n\xe2\x94\x82 str \xe2\x94\x86 str \xe2\x94\x86 bool  \xe2\x94\x82\n\xe2\x95\x9e\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\xaa\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\xaa\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\xa1\n\xe2\x94\x82 a   \xe2\x94\x86 app \xe2\x94\x86 …

Run Code Online (Sandbox Code Playgroud)

python python-polars

Sya*_*man

lucky-day

2
推荐指数

1
解决办法

1590
查看次数

循环使用Haskell中的函数

我只是对这一个感到困惑,它是一个Haskell循环排序的东西,我无法弄清楚如何写.基本上,我已经定义了三个函数split,riffle和shuffle.

split :: [a] -> ([a],[a])
split xs = splitAt (length xs `div` 2) xs

riffle :: [a] -> [a] -> [a]
riffle xs [] = xs
riffle [] ys = ys
riffle (x:xs) (y:ys) = x:y:riffle xs ys

shuffle :: Int -> [a] -> [a]
shuffle 0 xs = xs
shuffle n xs = shuffle (n-1) (riffle a b)
    where (a, b) = split xs

Run Code Online (Sandbox Code Playgroud)

基本上拆分只是将列表分成两半,riffle应该"交织"两个列表,例如: