从 BeautifulSoup 中的 JSON 对象中解析出特定值

Question

从 BeautifulSoup 中的 JSON 对象中解析出特定值

11 parsing json beautifulsoup python-3.x

import urllib
from urllib import request
from bs4 import BeautifulSoup

url = 'http://mygene.info/v3/query?q=symbol:CDK2&species:human&fields=name,symbol,entrezgene'
html = request.urlopen(url).read()
soup = BeautifulSoup(html)

Run Code Online (Sandbox Code Playgroud)

输出：

<html><body><p>{
  "max_score": 88.84169,
  "took": 6,
  "total": 244,
  "hits": [
    {
      "_id": "1017",
      "_score": 88.84169,
      "entrezgene": "1017",
      "name": "cyclin dependent kinase 2",
      "symbol": "CDK2"
    },
    {
      "_id": "12566",
      "_score": 73.8155,
      "entrezgene": "12566",
      "name": "cyclin-dependent kinase 2",
      "symbol": "Cdk2"
    },
    {
      "_id": "362817",
      "_score": 62.09322,
      "entrezgene": "362817",
      "name": "cyclin dependent kinase 2",
      "symbol": "Cdk2"
    }
  ]
}</p></body></html>

Run Code Online (Sandbox Code Playgroud)

目标：从这个输出，我想分析出 entrezgene， name和symbol值

问题：我如何去完成这个？

背景： 我尝试过https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class和Python BeautifulSoup 提取元素之间的文本来命名一对，但我找不到什么我在寻找

Answer 1

Bit*_*han 17

你可以得到text它是JSON格式。然后使用json.loads()将其转换为Dictionary。

from urllib import request
from bs4 import BeautifulSoup
import json
url = 'http://mygene.info/v3/query?q=symbol:CDK2&species:human&fields=name,symbol,entrezgene'
html = request.urlopen(url).read()
soup = BeautifulSoup(html,'html.parser')
site_json=json.loads(soup.text)
#printing for entrezgene, do the same for name and symbol
print([d.get('entrezgene') for d in site_json['hits'] if d.get('entrezgene')])

Run Code Online (Sandbox Code Playgroud)

输出：

['1017', '12566', '362817', '100117828', '109992509', '100981695', '100925631']

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，10 月前
查看次数：	14255 次
最近记录：	6 年，10 月前