我是 Python 的新手。我已经通过 Pandas 抓取了一个 html 表,我正在寻找方法来插入一个具有重复字符串值的新列并将其设置为表的索引(如下:)。提醒说这张桌子很长:)。
原始df:
Age IQ
12 100
15 111
. .
. .
. .
. .
13 121
Run Code Online (Sandbox Code Playgroud)
预期 df"
Group Age IQ
A 12 100
A 15 111
. . .
. . .
. . .
. . .
A 13 121
Run Code Online (Sandbox Code Playgroud) 有谁知道如何通过 Beautifulsoup 从同一个网站上抓取 url 列表?list = ['url1', 'url2', 'url3'...]
================================================== ========================
我提取网址列表的代码:
url = 'http://www.hkjc.com/chinese/racing/selecthorsebychar.asp?ordertype=2'
url1 = 'http://www.hkjc.com/chinese/racing/selecthorsebychar.asp?ordertype=3'
url2 = 'http://www.hkjc.com/chinese/racing/selecthorsebychar.asp?ordertype=4'
r = requests.get(url)
r1 = requests.get(url1)
r2 = requests.get(url2)
data = r.text
soup = BeautifulSoup(data, 'lxml')
links = []
for link in soup.find_all('a', {'class': 'title_text'}):
links.append(link.get('href'))
data1 = r1.text
soup = BeautifulSoup(data1, 'lxml')
for link in soup.find_all('a', {'class': 'title_text'}):
links.append(link.get('href'))
data2 = r2.text
soup = BeautifulSoup(data2, 'lxml')
for link in soup.find_all('a', {'class': 'title_text'}):
links.append(link.get('href'))
new = ['http://www.hkjc.com/chinese/racing/']*1123
url_list = …Run Code Online (Sandbox Code Playgroud)