par*_*sis 8 python beautifulsoup
我需要以这种格式从文本文件(output.txt)中的表中获取数据:data1; data2; data3; data4; .....
Celkova podlahova plocha bytu; 33m; Vytah; Ano; Nadzemne podlazie; Prizemne podlazie; .....; Forma vlastnictva; Osobne
全部在" 一行 "中,分隔符为" ; "(稍后在csv文件中导出).
我是初学者..帮助,谢谢.
from BeautifulSoup import BeautifulSoup
import urllib2
import codecs
response = urllib2.urlopen('http://www.reality.sk/zakazka/0747-003578/predaj/1-izb-byt/kosice-mestska-cast-sever-sladkovicova-kosice-sever/art-real-1-izb-byt-sladkovicova-ul-kosice-sever')
html = response.read()
soup = BeautifulSoup(html)
tabulka = soup.find("table", {"class" : "detail-char"})
for row in tabulka.findAll('tr'):
col = row.findAll('td')
prvy = col[0].string.strip()
druhy = col[1].string.strip()
record = ([prvy], [druhy])
fl = codecs.open('output.txt', 'wb', 'utf8')
for rec in record:
line = ''
for val in rec:
line += val + u';'
fl.write(line + u'\r\n')
fl.close()
Run Code Online (Sandbox Code Playgroud)
pwd*_*son 14
您在阅读时没有保留每条记录.试试这个,将记录存储在records:
from BeautifulSoup import BeautifulSoup
import urllib2
import codecs
response = urllib2.urlopen('http://www.reality.sk/zakazka/0747-003578/predaj/1-izb-byt/kosice-mestska-cast-sever-sladkovicova-kosice-sever/art-real-1-izb-byt-sladkovicova-ul-kosice-sever')
html = response.read()
soup = BeautifulSoup(html)
tabulka = soup.find("table", {"class" : "detail-char"})
records = [] # store all of the records in this list
for row in tabulka.findAll('tr'):
col = row.findAll('td')
prvy = col[0].string.strip()
druhy = col[1].string.strip()
record = '%s;%s' % (prvy, druhy) # store the record with a ';' between prvy and druhy
records.append(record)
fl = codecs.open('output.txt', 'wb', 'utf8')
line = ';'.join(records)
fl.write(line + u'\r\n')
fl.close()
Run Code Online (Sandbox Code Playgroud)
这可以清理得更多,但我认为这是你想要的.