小编Inf*_*evo的帖子

将维基百科表格抓取到熊猫数据框

我需要将维基百科表格抓取到熊猫数据框并创建三列:邮政编码、自治市镇和社区。

https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

这是我使用的代码:

import requests
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')
print(soup.prettify())

My_table = soup.find('table',{'class':'wikitable sortable'})
My_table

links = My_table.findAll('a')
links

Neighbourhood = [ ]

for link in links:
    Neighbourhood.append(link.get('title'))

print (Neighbourhood)

import pandas as pd

df = pd.DataFrame([])
df['PostalCode', 'Borough', 'Neighborhood'] = Neighbourhood
df
Run Code Online (Sandbox Code Playgroud)

它返回:

    (PostalCode, Borough, Neighborhood)
0   North York
1   Parkwoods
2   North York
3   Victoria Village
4   Downtown Toronto
5   Harbourfront (Toronto)
6   Downtown Toronto
7   Regent Park
8   North York
Run Code Online (Sandbox Code Playgroud)

我不知道如何从维基百科表格中获取邮政编码和社区。 …

python wikipedia pandas

4
推荐指数
2
解决办法
4047
查看次数

从维基百科表格中抓取数据

我只是想将维基百科表中的数据抓取到熊猫数据框中。

我需要重现三列:“邮政编码、自治市镇、社区”。

import requests
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'xml')
print(soup.prettify())

My_table = soup.find('table',{'class':'wikitable sortable'})
My_table

links = My_table.findAll('a')
links

Neighbourhood = []
for link in links:
    Neighbourhood.append(link.get('title'))

print (Neighbourhood)

import pandas as pd
df = pd.DataFrame([])
df['PostalCode', 'Borough', 'Neighbourhood'] = pd.Series(Neighbourhood)

df
Run Code Online (Sandbox Code Playgroud)

它只返回自治市镇......

谢谢

python wikipedia beautifulsoup pandas

0
推荐指数
1
解决办法
4142
查看次数

标签 统计

pandas ×2

python ×2

wikipedia ×2

beautifulsoup ×1