基本的 BeautifulSoup 维基百科刮取

Spe*_*thy 3 python beautifulsoup web-scraping pandas

我试图<ul>从维基百科中获得一个非常基本的、简短的、基本的无序列表。我的最终目标是将其放入DataFrame. 我的问题是,我从这里去哪里?

In [28]: from bs4 import BeautifulSoup

         import urllib2

         import requests

         from pandas import Series,DataFrame

In [29]: url = "https://en.wikipedia.org/wiki/National_Pro_Grid_League"

In [31]: result = requests.get(url)

In [32]: c = result.content

In [33]: soup = BeautifulSoup(c)
Run Code Online (Sandbox Code Playgroud)

我似乎无法在这个 StackOverflow 上找到任何答案,所以我很感激任何人能给我的任何建议。
这是我正在寻找的特定列表:

Active teams[edit]
Baltimore Anthem (2015–present)
Boston Iron (2014–present)
DC Brawlers (2014–present)
Los Angeles Reign (2014–present)
Miami Surge (2014–present)
New York Rhinos (2014–present)
Phoenix Rise (2014–present)
San Francisco Fire (2014–present)
Run Code Online (Sandbox Code Playgroud)

wpe*_*rcy 5

首先,您需要找到页面的正确部分。您可以通过找到标题,id="Active_teams_at_league_closing"然后<ul>从那里找到下一个元素来做到这一点。

from bs4 import BeautifulSoup
import requests

url = "https://en.wikipedia.org/wiki/National_Pro_Grid_League"
r = requests.get(url)
soup = BeautifulSoup(r.content)

heading = soup.find(id='Active_teams_at_league_closing')
teams = heading.find_next('ul')
for team in teams:
    print(team.string)
Run Code Online (Sandbox Code Playgroud)