迭代Selenium中的表行(Python)

Fie*_*nix 8 python selenium xpath

我有一个带有表格的网页,只有当我点击"检查元素"时才会显示该表格,并且通过"查看源"页面看不到.该表只包含两行,每行包含几个单元格,看起来类似于:

<table class="datadisplaytable">
<tbody>
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</tbody>
</table>
Run Code Online (Sandbox Code Playgroud)

我要做的是迭代行并返回每个单元格中包含的文本.我似乎无法用Selenium做到这一点.元素不包含ID,我不知道如何获得它们.我不太熟悉使用xpath等.

这是一个调试尝试,返回一个TypeError:

def check_grades(self):
    table = []
    for i in self.driver.find_element_by_class_name("dddefault"):
        table.append(i)
    print(table)
Run Code Online (Sandbox Code Playgroud)

从行中获取文本的简单方法是什么?

Pad*_*ham 10

如果要使用xpath逐行进行,可以使用以下命令:

h  = """<table class="datadisplaytable">
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</table>"""

from lxml import html
xml = html.fromstring(h)
# gets the table
table =  xml.xpath("//table[@class='datadisplaytable']")[0]


# iterate over all the rows   
for row in table.xpath(".//tr"):
     # get the text from all the td's from each row
    print([td.text for td in row.xpath(".//td[@class='dddefault'][text()])
Run Code Online (Sandbox Code Playgroud)

哪个输出:

['16759', 'MATH', '123', '001', 'Calculus']
['16449', 'PHY', '456', '002', 'Physics']
Run Code Online (Sandbox Code Playgroud)

使用td[text()]将避免为没有文本的td返回任何Nones.

所以使用硒做同样的事情你会:

table =  driver.find_element_by_xpath("//table[@class='datadisplaytable']")

for row in table.find_elements_by_xpath(".//tr"):
    print([td.text for td in row.find_elements_by_xpath(".//td[@class='dddefault'][1]"])
Run Code Online (Sandbox Code Playgroud)

对于多个表:

def get_row_data(table):
   for row in table.find_elements_by_xpath(".//tr"):
        yield [td.text for td in row.find_elements_by_xpath(".//td[@class='dddefault'][text()]"])


for table in driver.find_elements_by_xpath("//table[@class='datadisplaytable']"):
    for data in get_row_data(table):
        # use the data
Run Code Online (Sandbox Code Playgroud)


Har*_*vey 7

XPath很脆弱。最好使用CSS选择器或类:

mytable = find_element_by_css_selector('table.datadisplaytable')
for row in mytable.find_elements_by_css_selector('tr'):
    for cell in row.find_elements_by_tag_name('td'):
        print(cell.text)
Run Code Online (Sandbox Code Playgroud)