小编Fus*_*rry的帖子

将相同的字符串添加到列表中的所有项目

已经做了一些搜索Stack Exchange回答的问题,但一直无法找到我要找的东西.

鉴于以下列表:

a = [1, 2, 3, 4]

Run Code Online (Sandbox Code Playgroud)

我将如何创建:

a = ['hello1', 'hello2', 'hello3', 'hello4']

Run Code Online (Sandbox Code Playgroud)

谢谢!

python

Fus*_*rry

lucky-day

18
推荐指数

2
解决办法

1万
查看次数

保存网页源的固有方法

我已经阅读了很多关于网络抓取的答案,谈论了BeautifulSoup,Scrapy等来执行网络抓取.

有没有办法相当于从网络浏览器保存页面的来源？

也就是说,Python中有没有一种方法可以将它指向一个网站,然后通过标准的Python模块将页面源保存到文本文件中？

这是我到达的地方:

import urllib

f = open('webpage.txt', 'w')
html = urllib.urlopen("http://www.somewebpage.com")

#somehow save the web page source

f.close()

Run Code Online (Sandbox Code Playgroud)

我不太了解 - 但是寻找代码来实际拉取页面的来源以便我可以编写它.我收集到urlopen只是建立联系.

也许有一个readlines()等效于读取网页的行？

python web-scraping

Fus*_*rry

2017 02-04

15
推荐指数

1
解决办法

2万
查看次数

在Mac OSX上安装BeautifulSoup

我在这里尝试了一切:如何在Mac上安装Beautiful Soup模块？

从传统的安装方式和使用easy_install开始安装似乎都能正常工作(在安装过程中获得正确的输出),但是当我使用时:

from bs4 import BeautifulSoup

Run Code Online (Sandbox Code Playgroud)

口译员说没有这样的模块存在.

我应该先看一下来解决这个问题？

python beautifulsoup

Fus*_*rry

2017 05-23

7
推荐指数

1
解决办法

1万
查看次数

将项目写入文件的单独行,末尾没有空行

我有一个文件,其中包含一些我想要删除的文本,匹配一堆东西,然后将这些项目写入新文件中的单独行.

这是我放在一起的代码的基础知识:

f = open('this.txt', 'r')
g = open('that.txt', 'w')
text = f.read()
matches = re.findall('', text) # do some re matching here
for i in matches:
    a = i[0] + '\n'
    g.write(a)
f.close()
g.close()

Run Code Online (Sandbox Code Playgroud)

我的问题是我希望每个匹配的项目在一个新行(因此'\n'),但我不希望在文件的末尾有一个空行.

我想我不需要让新行字符跟踪文件中的最后一项.

什么是Pythonic排序方式？另外,我在代码中设置这个方法的方式是最好的方法,还是最Pythonic？

python

Fus*_*rry

2012 11-11

3
推荐指数

1
解决办法

1308
查看次数

不要将最终的新行字符写入文件

我查看了StackOverflow并找不到我的具体问题的答案,所以请原谅我,如果我错过了什么.

import re

target = open('output.txt', 'w')

for line in open('input.txt', 'r'):
    match = re.search(r'Stuff', line)
    if match:
        match_text = match.group()
        target.write(match_text + '\n')
    else:
        continue
target.close()

Run Code Online (Sandbox Code Playgroud)

我正在解析的文件非常庞大,因此需要逐行处理.

这(当然)在文件末尾留下了一个额外的换行符.

我应该如何最好地更改此代码,以便在'if match'循环的最后一次迭代中,它不会将额外的换行符放在文件的末尾.它应该在最后再次查看文件并删除最后一行(虽然看起来有点低效)？

我发现的现有StackOverflow问题包括从文件中删除所有新行.

如果有更多pythonic /有效的方式来编写这段代码,我也会欢迎我自己学习的建议.

谢谢您的帮助!

python

Fus*_*rry

lucky-day

2
推荐指数

2
解决办法

5922
查看次数

嵌套for循环的缩进

我想知道为什么这是正确的:

for heading in soup.find_all("td", class_="paraheading"):
    key = " ".join(heading.text.split()).rstrip(":")
    if key in columns:
        print key
        next_td = heading.find_next_sibling("td", class_="bodytext")
        value = " ".join(next_td.text.split())
        print value
    if key == "Industry Categories":
        print key
        ic_next_td = heading.find_next_sibling("td", class_="bodytext")
        for value in ic_next_td.strings:
                print value

Run Code Online (Sandbox Code Playgroud)

这不是:

for heading in soup.find_all("td", class_="paraheading"):
    key = " ".join(heading.text.split()).rstrip(":")
    if key in columns:
        print key
        next_td = heading.find_next_sibling("td", class_="bodytext")
        value = " ".join(next_td.text.split())
        print value
    if key == "Industry Categories":
        print key
        ic_next_td = heading.find_next_sibling("td", class_="bodytext") …

Run Code Online (Sandbox Code Playgroud)

python indentation

Fus*_*rry

2012 11-19

1
推荐指数

1
解决办法

5162
查看次数