_嗨,这是我的网页:
<html>
<head>
</head>
<body>
<div> text div 1</div>
<div>
<span>text of first span </span>
<span>text of second span </span>
</div>
<div> text div 3 </div>
</body>
</html>
Run Code Online (Sandbox Code Playgroud)
我使用 jsoup 来解析它,然后浏览页面内的所有元素并获取它们的路径:
Document doc = Jsoup.parse(new File("C:\\Users\\HC\\Desktop\\dataset\\index.html"), "UTF-8");
Elements elements = doc.body().select("*");
ArrayList all = new ArrayList();
for (Element element : elements) {
if (!element.ownText().isEmpty()) {
StringBuilder path = new StringBuilder(element.nodeName());
String value = element.ownText();
Elements p_el = element.parents();
for (Element el : p_el) {
path.insert(0, el.nodeName() + '/');
}
all.add(path …Run Code Online (Sandbox Code Playgroud) 我想基于子字符串拆分字符串,并得到第一部分.以下示例.
输入:
body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]/div/div/div/div[2]/div[2]
Run Code Online (Sandbox Code Playgroud)
Ouptut:分裂为[12]
body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]
Run Code Online (Sandbox Code Playgroud)
我写了这段代码:
String path1 = "body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]/div/div/div/div[2]/div[2]"
String result;
if(path1.contains("[12]")){
System.out.println("yes");
result = path1.split("[12]")[0];
System.out.println(result);
}
Run Code Online (Sandbox Code Playgroud)
但我得到的结果如下:
body/div[
Run Code Online (Sandbox Code Playgroud)