我想使用 python 脚本废弃由 javascript 函数创建的 DIV 内容。我已经尝试过使用 BS4 并且通过这样做我无法获得动态数据。相反,它只显示源代码。
示例代码:
import requests
from bs4 import BeautifulSoup
URL = "https://rawgit.com/skysoft999/tableauJS/master/example.html"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
for row in soup.findAll('div', attrs = {'class':'quote'}):
print(row)
print(soup.prettify())
Run Code Online (Sandbox Code Playgroud)
示例 HTML 源代码位于Pastebin 中
要提取的样本数据:
\u2264 这是字符(小于或等于),这是错误的根本原因。详细错误日志:
Traceback (most recent call last):
File "C:\Dev\EXE\TEMP\cookie\crumbs\views.py", line 1520, in parser
html_file.write(html_text)
File "C:\Users\Cookie1\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2264' in position 389078: character maps to <undefined>
Run Code Online (Sandbox Code Playgroud)