小编yeu*_*ase的帖子

如何将具有重复值的新列插入到熊猫表中？

我是 Python 的新手。我已经通过 Pandas 抓取了一个 html 表，我正在寻找方法来插入一个具有重复字符串值的新列并将其设置为表的索引（如下：）。提醒说这张桌子很长:)。

原始df：

Age IQ
12  100
15  111
 .   .
 .   .
 .   .
 .   .
13  121

Run Code Online (Sandbox Code Playgroud)

预期 df"

Group  Age IQ
 A     12  100
 A     15  111
 .      .   .
 .      .   .
 .      .   .
 .      .   .
 A     13  121

Run Code Online (Sandbox Code Playgroud)

python pandas

yeu*_*ase

2017 06-20

8
推荐指数

1
解决办法

2万
查看次数

如何使用BeautifulSoup循环浏览用于网页抓取的网址列表

有谁知道如何通过 Beautifulsoup 从同一个网站上抓取 url 列表？list = ['url1', 'url2', 'url3'...]

================================================== ========================

我提取网址列表的代码：

url = 'http://www.hkjc.com/chinese/racing/selecthorsebychar.asp?ordertype=2'
url1 = 'http://www.hkjc.com/chinese/racing/selecthorsebychar.asp?ordertype=3'
url2 = 'http://www.hkjc.com/chinese/racing/selecthorsebychar.asp?ordertype=4'

r  = requests.get(url)
r1  = requests.get(url1)
r2  = requests.get(url2)

data = r.text
soup = BeautifulSoup(data, 'lxml')
links = []

for link in soup.find_all('a', {'class': 'title_text'}):
    links.append(link.get('href'))

data1 = r1.text

soup = BeautifulSoup(data1, 'lxml')

for link in soup.find_all('a', {'class': 'title_text'}):
    links.append(link.get('href'))

data2 = r2.text

soup = BeautifulSoup(data2, 'lxml')

for link in soup.find_all('a', {'class': 'title_text'}):
    links.append(link.get('href'))

new = ['http://www.hkjc.com/chinese/racing/']*1123

url_list = …

Run Code Online (Sandbox Code Playgroud)

python beautifulsoup

yeu*_*ase

2017 07-01

0
推荐指数

1
解决办法

1万
查看次数