11 parsing json beautifulsoup python-3.x
import urllib
from urllib import request
from bs4 import BeautifulSoup
url = 'http://mygene.info/v3/query?q=symbol:CDK2&species:human&fields=name,symbol,entrezgene'
html = request.urlopen(url).read()
soup = BeautifulSoup(html)
Run Code Online (Sandbox Code Playgroud)
输出:
<html><body><p>{
"max_score": 88.84169,
"took": 6,
"total": 244,
"hits": [
{
"_id": "1017",
"_score": 88.84169,
"entrezgene": "1017",
"name": "cyclin dependent kinase 2",
"symbol": "CDK2"
},
{
"_id": "12566",
"_score": 73.8155,
"entrezgene": "12566",
"name": "cyclin-dependent kinase 2",
"symbol": "Cdk2"
},
{
"_id": "362817",
"_score": 62.09322,
"entrezgene": "362817",
"name": "cyclin dependent kinase 2",
"symbol": "Cdk2"
}
]
}</p></body></html>
Run Code Online (Sandbox Code Playgroud)
目标:从这个输出,我想分析出 entrezgene
, name
和symbol
值
问题:我如何去完成这个?
背景: 我尝试过https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class和Python BeautifulSoup 提取元素之间的文本来命名一对,但我找不到什么我在寻找
Bit*_*han 17
你可以得到text
它是JSON格式。然后使用json.loads()将其转换为Dictionary。
from urllib import request
from bs4 import BeautifulSoup
import json
url = 'http://mygene.info/v3/query?q=symbol:CDK2&species:human&fields=name,symbol,entrezgene'
html = request.urlopen(url).read()
soup = BeautifulSoup(html,'html.parser')
site_json=json.loads(soup.text)
#printing for entrezgene, do the same for name and symbol
print([d.get('entrezgene') for d in site_json['hits'] if d.get('entrezgene')])
Run Code Online (Sandbox Code Playgroud)
输出:
['1017', '12566', '362817', '100117828', '109992509', '100981695', '100925631']
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
14255 次 |
最近记录: |