标签: jsoup

如何解析HTML中的文本

如何使用java使用jsoup解析网页中的文本?

java jsoup

9
推荐指数
1
解决办法
2万
查看次数

Android JSoup示例

我只是想知道有没有人有一个示例eclipse项目与JSoup的工作实现?我试图用它从网站上提取信息并且已经遍布谷歌试图让它工作但不能.如果有人可以提供帮助,我会非常感激.

android jsoup

9
推荐指数
1
解决办法
2万
查看次数

为什么"div [class = mncls sbucls]"工作而"div.mncls sbucls"不工作?

以下Jsoup语句有效:

 Elements divs = document.select("div[class=mncls sbucls]");
Run Code Online (Sandbox Code Playgroud)

但相当的声明:

 Elements divs = document.select("div.mncls sbucls");
Run Code Online (Sandbox Code Playgroud)

不行.

为什么?

Jsoup是否有带空格的类名有问题?

html html-parsing jsoup

9
推荐指数
1
解决办法
2677
查看次数

使用JSoup提取HTML表格内容

如何提取位于以下位置的表格内容:/id/2/year/2012/acc-conference">http://espn.go.com/mens-college-basketball/conferences/standings//id/2 /年/ 2012/ACC-会议

我见过的几个例子并不太清楚如何获取表格的内容.有人可以提供任何帮助吗?

jsoup

9
推荐指数
1
解决办法
2万
查看次数

通过jSoup从Div标签获取属性值

我有一个Div标签如下

<div id="eventTTL" style="text-transform: uppercase; font-weight: 900;" eventTTL="4583476000">5 days 07:14:41</div>
Run Code Online (Sandbox Code Playgroud)

我如何获得eventTTL的价值?我想显示eventTTL的值,即:)"4583476000".

java html-parsing jsoup

9
推荐指数
1
解决办法
1万
查看次数

Jsoup:xml标记中冒号时的SelectorParseException

当xml标记有冒号时抛出异常,

例外:

org.jsoup.select.Selector $ SelectorParseException:无法解析查询'w:r':':r'处的意外标记

XML:

<w:r>
 <w:rPr>
   <w:rStyle w:val="jid"/>
 </w:rPr>
 <w:t>AN</w:t>
</w:r>
Run Code Online (Sandbox Code Playgroud)

Java代码:

    org.jsoup.nodes.Document doc = Jsoup.parse(documentXmlString);
Run Code Online (Sandbox Code Playgroud)

这里documentXmlString具有上面指定的xml

java xml-parsing jsoup

9
推荐指数
2
解决办法
3541
查看次数

如何在JSoup中选择"this element"的直接子项

如果我有一个看起来像这样的元素:

<foo>
    <bar> bar text 1 </bar>
    <baz>
        <bar> bar text 2 </bar>
    </baz>
</foo>
Run Code Online (Sandbox Code Playgroud)

而且我已经<foo>选择了元素,并且我想选择<bar>直接子元素<foo>但不是子元素的元素<baz>,如何指定?

Element foo = <that thing above>
foo.select("bar").text();
Run Code Online (Sandbox Code Playgroud)

产量 "bar text 1 bar text 2"

我想要的是类似的东西

foo.select("this > bar").text();
Run Code Online (Sandbox Code Playgroud)

问题是:如何在选择器中指定"this element"

请注意,所需的bar可能不是第一个 - 我需要一个适用于以下方面的解决方案:

<foo>
    <baz>
        <bar> bar text 2 </bar>
    </baz>
    <bar> bar text 1 </bar>
</foo>
Run Code Online (Sandbox Code Playgroud)

jsoup

9
推荐指数
1
解决办法
2990
查看次数

Crawler4j与Jsoup一起用于Java中的页面爬行和解析

我想获取页面的内容并提取其中的特定部分.据我所知,至少有两种解决方案可以完成这样的任务:Crawler4jJsoup.

它们都能够检索页面的内容并提取它的子部分.我唯一不明白的是它们之间的区别是什么?有一个类似的问题,标记为已回答:

Crawler4j是一个爬虫,Jsoup是一个解析器.

但我刚刚检查过,除了解析功能外,Jsoup 1.8.3还能够抓取页面,而Crawler4j不仅可以抓取页面而且可以解析其内容.

那么,请你澄清Crawler4j和Jsoup之间的区别吗?

java web-crawler html-parsing jsoup crawler4j

9
推荐指数
1
解决办法
3817
查看次数

如何从网页的(一个标签内)的HTML页面源中提取数据?

我尝试了其他答案中指定的几种解决方案,例如尝试使用不同的用户代理(Chrome,safari等),并使用HTTPClient和BufferedReader直接获取HTML,但它们都不起作用.如何使Android输出类似于Web输出?这是我正在寻找的网络输出; (查看完整输出的https://finance.yahoo.com/quote/AAPL/financials?p=AAPL的页面源- 这基本上包含名为"Quarterly"的AJAX选项卡,其中包含一个表.我需要获取该数据,但Android HTML源代码没有它,但网络源代码确实如此.)

root.App.main = {"context":{"dispatcher":{"stores":{"PageStore":{"currentPageName":"quote","currentRenderTargetId":"default","pagesConfigRaw":{"base":{"quote":{"layout":{"bundleName":"yahoodotcom-layout.TwoColumnLayout","name":"TwoColumnLayout","config":{"enableHeaderCollapse":true,"Header":{"isFixed":true,"uhContainerClasses":"Bgi($uhGrayGradient)","navContainerClasses":"Bgi($navrailGrayGradient) Bxsh($navrailShadow) Pos(r) hasScrolled_Bxsh(headerShadow) Panel-open_Bxsh(headerShadow)","navTransitionClasses":"HideNavrail_Translate3d(0,-46px,0) Panel-open_Translate3d(0,-46px,0)","secondaryNavContainerClasses":"hasScrolled_Bdbw(0px) Bxsh($navrailShadow)","height":135},"fetchNewAttribution":true},"meta":{"property":{"twitter:site":"@YahooFinance"}}},"meta":{"property":{"twitter:site":"@YahooFinance","fb:pages":"90376669494"}},"regions":{"SecondaryNav":[{"bundleName":"react-finance","name":"SecondaryNav","config":{"ui":{"enableRelativeUrl":true}},"props":{"key":"SecondaryNav-0-SecondaryNav","id":"SecondaryNav-0-SecondaryNav"},"isPageComposite":true}],"Overlay":[{"bundleName":"react-lightbox","name":"Lightbox","props":{"key":"Overlay-0-Lightbox","id":"Overlay-0-Lightbox"},"isPageComposite":true},{"bundleName":"td-app-finance","name":"Null","props":{"key":"Overlay-1-Null","id":"Overlay-1-Null"},"isPageComposite":true},{"bundleName":"td-app-finance","name":"Null","props":{"key":"Overlay-2-Null","id":"Overlay-2-Null"},"isPageComposite":true}],"Lead":[{"bundleName":"react-finance","name":"FinanceHeader","props":{"className":"Bxz(bb) H(100%) Pos(r) Maw($newGridWidth) Miw($minGridWidth) Miw(a)!--tab768 Miw(a)!--tab1024 Mstart(a) Mend(a) Px(20px) My(10px)","showAds":true,"adsConfig":{"positions":["FB2A","FB2B","FB2C","FB2D"]},"key":"Lead-0-FinanceHeader","id":"Lead-0-FinanceHeader"},"isPageComposite":true},{"bundleName":"tdv2-applet-featurebar","name":"FeatureBar","config":{"ui":{"container_classnames":"W(100%) Bxz(bb) Bdrs(2px) Mb(10px) Maw($maxModuleWidth) Miw($minGridWidth) Miw(a)!--tab768 Miw(a)!--tab1024 Mx(a)","prerender":{"enabled":true,"renderTargetId":"modal"}},"site":"finance"},"props":{"key":"Lead-1-FeatureBar","id":"Lead-1-FeatureBar"},"isPageComposite":true},{"bundleName":"QuotePage","name":"QuoteHeader","props":{"key":"Lead-2-QuoteHeader","id":"Lead-2-QuoteHeader"},"isPageComposite":true},{"bundleName":"QuotePage","name":"QuoteNav","props":{"key":"Lead-3-QuoteNav","id":"Lead-3-QuoteNav"},"isPageComposite":true}],"Col1":[{"bundleName":"td-ads","name":"Ad","props":{"pos":"LDRB","style":{"marginBottom":"8px","paddingTop":"0px","marginLeft":"auto","marginRight":"auto","textAlign":"center","lineHeight":"0px","position":"relative","zIndex":"5"},"key":"Col1-0-Ad","id":"Col1-0-Ad"},"isPageComposite":true},{"bundleName":"Quote.financials","name":"Financials","props":{"key":"Col1-1-Financials","id":"Col1-1-Financials"},"isPageComposite":true},{"bundleName":"react-finance","name":"AdUnitWithTdAds","props":{"className":"ad-foot","positions":["FOOT"],"key":"Col1-2-AdUnitWithTdAds","id":"Col1-2-AdUnitWithTdAds"},"isPageComposite":true},{"bundleName":"react-finance","name":"AdUnitWithTdAds","props":{"className":"ad-fsrvy","positions":["FSRVY"],"key":"Col1-3-AdUnitWithTdAds","id":"Col1-3-AdUnitWithTdAds"},"isPageComposite":true}],"Col2":[{"bundleName":"td-app-finance","name":"ExtPromoButton","props":{"className":"btn Bds(s) Bdc($c-fuji-grey-c) Bdrs(4px) Bgc($white) Bdw(1px) Bgc($ExtButtonHov):h C($white):h C($ExtButtonHov) Cur(p) Fz(s) Fw(b) H(44px) Lh(40px) Mb(20px) Ta(c) Td(n) W(100%)","sec":"ext-promo-all-mkt-submit","titleId":"EXTENSION_PROMO_TITLE","url":"https:\u002F\u002Fchrome.google.com\u002Fwebstore\u002Fdetail\u002Fdoojmkhhplhicnghmafjbhncmgjiohma","enabled":true,"key":"Col2-0-ExtPromoButton","id":"Col2-0-ExtPromoButton"},"isPageComposite":true},{"bundleName":"QuotePage","name":"QuoteModule","props":{"type":"eventPromo","key":"Col2-1-QuoteModule","id":"Col2-1-QuoteModule"},"isPageComposite":true},{"bundleName":"td-ads","name":"ComboAd","props":{"adparseStyle":{"marginBottom":"20px"},"finishedStyle":{"marginBottom":"20px"},"children":[{"bundleName":"td-ads","name":"Ad","props":{"pos":"LREC"}},{"bundleName":"td-ads","name":"Ad","props":{"pos":"MON"}}],"serverHeight":true,"key":"Col2-2-ComboAd","id":"Col2-2-ComboAd"},"isPageComposite":true},{"bundleName":"QuotePage","name":"QuoteModule","props":{"type":"similarCompanies","key":"Col2-3-QuoteModule","id":"Col2-3-QuoteModule"},"initMode":{"deferRender":true},"isPageComposite":true},{"bundleName":"QuotePage","name":"QuoteModule","props":{"type":"earningsChart","key":"Col2-4-QuoteModule","id":"Col2-4-QuoteModule"},"initMode":{"deferRender":true},"isPageComposite":true},{"bundleName":"QuotePage","name":"QuoteModule","props":{"type":"financialsChart","key":"Col2-5-QuoteModule","id":"Col2-5-QuoteModule"},"initMode":{"deferRender":true},"isPageComposite":true},{"bundleName":"react-finance",..."}}}};
Run Code Online (Sandbox Code Playgroud)

这是我得到的Android输出;

(root.App.main = {"context":{"dispatcher":{"stores":{"PageStore":{"currentPageName":"quote","currentRenderTargetId":"default","pagesConfigRaw":{"base":{"quote":{"layout":{"bundleName":"yahoodotcom-layout.TwoColumnLayout","name":"TwoColumnLayout","config":{"enableHeaderCollapse":true,"Header":{"isFixed":true,"uhContainerClasses":"Bgi($uhGrayGradient)","navContainerClasses":"Bgi($navrailGrayGradient) Bxsh($navrailShadow) Pos(r) hasScrolled_Bxsh(headerShadow) Panel-open_Bxsh(headerShadow)","navTransitionClasses":"HideNavrail_Translate3d(0,-46px,0) Panel-open_Translate3d(0,-46px,0)","secondaryNavContainerClasses":"hasScrolled_Bdbw(0px) Bxsh($navrailShadow)","height":135},"fetchNewAttribution":true},"meta":{"property":{"twitter:site":"@YahooFinance"}}},"meta":{"property":{"twitter:site":"@YahooFinance","fb:pages":"90376669494"}},"regions":{"SecondaryNav":[{"bundleName":"react-finance","name":"SecondaryNav","config":{"ui":{"enableRelativeUrl":true}},"props":{"key":"SecondaryNav-0-SecondaryNav","id":"SecondaryNav-0-SecondaryNav"},"isPageComposite":true}],"Overlay":[{"bundleName":"react-lightbox","name":"Lightbox","props":{"key":"Overlay-0-Lightbox","id":"Overlay-0-Lightbox"},"isPageComposite":true},{"bundleName":"td-app-finance","name":"Null","props":{"key":"Overlay-1-Null","id":"Overlay-1-Null"},"isPageComposite":true},{"bundleName":"td-app-finance","name":"Null","props":{"key":"Overlay-2-Null","id":"Overlay-2-Null"},"isPageComposite":true}],"Lead":[{"bundleName":"react-finance","name":"FinanceHeader","props":{"className":"Bxz(bb) H(100%) Pos(r) Maw($newGridWidth) Miw($minGridWidth) Miw(a)!--tab768 Miw(a)!--tab1024 Mstart(a) Mend(a) Px(20px) My(10px)","showAds":true,"adsConfig":{"positions":["FB2A","FB2B","FB2C","FB2D"]},"key":"Lead-0-FinanceHeader","id":"Lead-0-FinanceHeader"},"isPageComposite":true},{"bundleName":"tdv2-applet-featurebar","name":"FeatureBar","config":{"ui":{"container_classnames":"W(100%) Bxz(bb) Bdrs(2px) Mb(10px) Maw($maxModuleWidth) Miw($minGridWidth) Miw(a)!--tab768 …
Run Code Online (Sandbox Code Playgroud)

html java ajax android jsoup

9
推荐指数
1
解决办法
817
查看次数

在包含字符串的ArrayList中查找索引

通过使用Jsoup,我从网站解析HTML以填充ArrayList我需要从网站获取的内容.所以现在我有一个ArrayList充满字符串的东西.我想在该列表中找到包含特定字符串的索引.例如,我知道列表中的某个地方,在某个索引中,有字符串(文字)"Claude",但我似乎无法制作任何代码,找到contains"Claude"中的索引ArrayList...这里是我尝试过但返回-1(未找到):

ArrayList < String > list = new ArrayList < String > ();
String claude = "Claude";

Document doc = null;
try {
    doc = Jsoup.connect("http://espn.go.com/nhl/team/stats/_/name/phi/philadelphia-flyers").get();
} catch (IOException e) {
    e.printStackTrace();
}
for (Element table: doc.select("table.tablehead")) {
    for (Element row: table.select("tr")) {
        Elements tds = row.select("td");
        if (tds.size() > 6) {
            String a = tds.get(0).text() + tds.get(1).text() + tds.get(2).text() + tds.get(3).text() + tds.get(4).text() + tds.get(5).text() + tds.get(6).text();

            list.add(a);

            int …
Run Code Online (Sandbox Code Playgroud)

java string arraylist indexof jsoup

8
推荐指数
1
解决办法
5万
查看次数