将维基百科表格抓取到熊猫数据框

Inf*_*evo 4 python wikipedia pandas

我需要将维基百科表格抓取到熊猫数据框并创建三列:邮政编码、自治市镇和社区。

https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

这是我使用的代码:

import requests
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')
print(soup.prettify())

My_table = soup.find('table',{'class':'wikitable sortable'})
My_table

links = My_table.findAll('a')
links

Neighbourhood = [ ]

for link in links:
    Neighbourhood.append(link.get('title'))

print (Neighbourhood)

import pandas as pd

df = pd.DataFrame([])
df['PostalCode', 'Borough', 'Neighborhood'] = Neighbourhood
df
Run Code Online (Sandbox Code Playgroud)

它返回:

    (PostalCode, Borough, Neighborhood)
0   North York
1   Parkwoods
2   North York
3   Victoria Village
4   Downtown Toronto
5   Harbourfront (Toronto)
6   Downtown Toronto
7   Regent Park
8   North York
Run Code Online (Sandbox Code Playgroud)

我不知道如何从维基百科表格中获取邮政编码和社区。

谢谢

Ben*_*ère 9

pandas 允许您在一行代码中完成:

df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]

在此处输入图片说明


小智 0

提供错误消息。通过查看它,首先您有 df['Neighborhoods'] = Neighborhoods,其中您的列表的名称为Neighborhoods