小编Han*_*ler的帖子

要匹配的正则表达式模式,不包括...... /除外

- 编辑 - 目前的答案有一些有用的想法,但我想要一些更完整的东西,我可以100%理解和重用; 这就是我设置赏金的原因.对于我来说,无处不在的想法比没有标准语法更好\K

这个问题是关于如何匹配模式除了某些情况s1 s2 s3.我举一个具体的例子来说明我的意思,但更喜欢我能100%理解的一般答案,所以我可以在其他情况下重复使用它.

例

我希望匹配五位数\b\d{5}\b但不能在三种情况下匹配s1 s2 s3:

s1:不在以句子这样的句号结尾的行上.

s2:在parens里面没有任何地方.

s3:不在以#开头if(和结尾的块内//endif

我知道如何使用前瞻和后视来解决s1 s2 s3中的任何一个,尤其是在C#lookbehind或\KPHP中.

例如

S1 (?m)(?!\d+.*?\.$)\d+

s3与C#lookbehind (?<!if\(\D*(?=\d+.*?//endif))\b\d+\b

s3与PHP\K (?:(?:if\(.*?//endif)\D*)*\K\d+

但条件的混合使我的头脑爆炸.更糟糕的是,我可能需要在其他时间添加其他条件s4 s5.

好消息是,我不在乎是否使用PHP,C#,Python或邻居的洗衣机等大多数常用语言处理文件.:)我几乎是Python和Java的初学者,但有兴趣了解它是否有解决方案.

所以我来到这里看是否有人想到一个灵活的食谱.

提示没问题:你不需要给我完整的代码.:)

谢谢.

regex

Han*_*ler

2014 07-02

104
推荐指数

2
解决办法

2万
查看次数

asyncio web scraping 101:使用aiohttp获取多个url

在之前的问题中,其中一位作者aiohttp善意地建议使用以下新语法从aiohttp获取多个URL:async withPython 3.5

import aiohttp
import asyncio

async def fetch(session, url):
    with aiohttp.Timeout(10):
        async with session.get(url) as response:
            return await response.text()

async def fetch_all(session, urls, loop):
    results = await asyncio.wait([loop.create_task(fetch(session, url))
                                  for url in urls])
    return results

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    # breaks because of the first url
    urls = ['http://SDFKHSKHGKLHSKLJHGSDFKSJH.com',
            'http://google.com',
            'http://twitter.com']
    with aiohttp.ClientSession(loop=loop) as session:
        the_results = loop.run_until_complete(
            fetch_all(session, urls, loop))
        # do something with the the_results

Run Code Online (Sandbox Code Playgroud)

但是,当其中一个session.get(url) …

python web-scraping python-3.x python-asyncio aiohttp

Han*_*ler

2017 05-23

18
推荐指数

2
解决办法

6347
查看次数

在Python 3.5中使用aiohttp获取多个URL

因为Python 3.5引入了async with在推荐的语法文档的aiohttp改变.现在要获得一个网址,他们建议:

import aiohttp
import asyncio

async def fetch(session, url):
    with aiohttp.Timeout(10):
        async with session.get(url) as response:
            return await response.text()

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    with aiohttp.ClientSession(loop=loop) as session:
        html = loop.run_until_complete(
            fetch(session, 'http://python.org'))
        print(html)

Run Code Online (Sandbox Code Playgroud)

如何修改此设置以获取网址集合而不仅仅是一个网址？

在旧asyncio示例中,您将设置一个任务列表,例如

    tasks = [
            fetch(session, 'http://cnn.com'),
            fetch(session, 'http://google.com'),
            fetch(session, 'http://twitter.com')
            ]

Run Code Online (Sandbox Code Playgroud)

我试图将这样的列表与上面的方法结合起来但是失败了.

python web-scraping python-3.x python-asyncio aiohttp

Han*_*ler

2018 02-11

11
推荐指数

1
解决办法

5289
查看次数

为什么异常不打印？

在 REPL 中，我可以打印异常的字符串表示形式：

>>> print(str(ValueError))
<class 'ValueError'>
>>> print(ValueError)
<class 'ValueError'>

Run Code Online (Sandbox Code Playgroud)

在这个简单的代码中，不打印该值。我缺少什么？

第一口味：

try:
    raise ValueError
except Exception as e:
    print(str(e))
    print('We crashed!')

Run Code Online (Sandbox Code Playgroud)

这只是输出We crashed!

第二种口味输出相同。发生什么事了print(str(e))？

第二种口味：

def crash():
    raise ValueError

try:
    crash()
except Exception as e:
    print(str(e))
    print('We crashed!')

Run Code Online (Sandbox Code Playgroud)

python exception python-3.x

Han*_*ler

lucky-day

2
推荐指数

1
解决办法

8503
查看次数

标签统计

python ×3

python-3.x ×3

aiohttp ×2

python-asyncio ×2

web-scraping ×2

exception ×1

regex ×1

要匹配的正则表达式模式,不包括...... /除外

asyncio web scraping 101:使用aiohttp获取多个url

在Python 3.5中使用aiohttp获取多个URL

为什么异常不打印？

标签 统计

小编Han_ler的帖子

标签统计