我需要将维基百科表格抓取到熊猫数据框并创建三列:邮政编码、自治市镇和社区。
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
这是我使用的代码:
import requests
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')
print(soup.prettify())
My_table = soup.find('table',{'class':'wikitable sortable'})
My_table
links = My_table.findAll('a')
links
Neighbourhood = [ ]
for link in links:
Neighbourhood.append(link.get('title'))
print (Neighbourhood)
import pandas as pd
df = pd.DataFrame([])
df['PostalCode', 'Borough', 'Neighborhood'] = Neighbourhood
df
Run Code Online (Sandbox Code Playgroud)
它返回:
(PostalCode, Borough, Neighborhood)
0 North York
1 Parkwoods
2 North York
3 Victoria Village
4 Downtown Toronto
5 Harbourfront (Toronto)
6 Downtown Toronto
7 Regent Park
8 North York
Run Code Online (Sandbox Code Playgroud)
我不知道如何从维基百科表格中获取邮政编码和社区。 …
我只是想将维基百科表中的数据抓取到熊猫数据框中。
我需要重现三列:“邮政编码、自治市镇、社区”。
import requests
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'xml')
print(soup.prettify())
My_table = soup.find('table',{'class':'wikitable sortable'})
My_table
links = My_table.findAll('a')
links
Neighbourhood = []
for link in links:
Neighbourhood.append(link.get('title'))
print (Neighbourhood)
import pandas as pd
df = pd.DataFrame([])
df['PostalCode', 'Borough', 'Neighbourhood'] = pd.Series(Neighbourhood)
df
Run Code Online (Sandbox Code Playgroud)
它只返回自治市镇......
谢谢