我创建了一些python代码,它在循环中创建一个对象,并在每次迭代中用相同类型的新对象覆盖该对象.这样做了10.000次,Python每秒占用7mb的内存,直到我使用3gb的RAM.有没有人知道从内存中删除对象的方法?
因此,基本思想是通过使用beautifulsoup删除HTML标记和脚本来获取对某些列表URL的请求并从这些页面源解析文本.python版本2.7
问题是,在每次请求时,解析器函数都会在每次请求时不断添加内存.尺寸逐渐增大.
def get_text_from_page_source(page_source):
soup = BeautifulSoup(open(page_source),'html.parser')
# soup = BeautifulSoup(page_source,"lxml")
# kill all script and style elements
for script in soup(["script", "style"]):
script.decompose() # rip it out
# get text
text = soup.get_text()
# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
# drop blank lines
text = '\n'.join(chunk for …Run Code Online (Sandbox Code Playgroud)