从维基百科表格中抓取数据

Inf*_*evo 0 python wikipedia beautifulsoup pandas

我只是想将维基百科表中的数据抓取到熊猫数据框中。

我需要重现三列:“邮政编码、自治市镇、社区”。

import requests
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'xml')
print(soup.prettify())

My_table = soup.find('table',{'class':'wikitable sortable'})
My_table

links = My_table.findAll('a')
links

Neighbourhood = []
for link in links:
    Neighbourhood.append(link.get('title'))

print (Neighbourhood)

import pandas as pd
df = pd.DataFrame([])
df['PostalCode', 'Borough', 'Neighbourhood'] = pd.Series(Neighbourhood)

df
Run Code Online (Sandbox Code Playgroud)

它只返回自治市镇......

谢谢

G. *_*son 14

如果您只想让脚本从页面中拉出一张表格,那么您可能想多了。一次导入,一行,无循环:

import pandas as pd
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

df=pd.read_html(url, header=0)[0]

df.head()

    Postcode    Borough         Neighbourhood
0   M1A         Not assigned    Not assigned
1   M2A         Not assigned    Not assigned
2   M3A         North York      Parkwoods
3   M4A         North York      Victoria Village
4   M5A         Downtown Toronto    Harbourfront
Run Code Online (Sandbox Code Playgroud)