我在Jupyter上运行Scala Spark时遇到问题.当我在jupyter中加载Apache Toree - Scala笔记本时,下面是我的错误消息.
root@ubuntu-2gb-sgp1-01:~# jupyter notebook --ip 0.0.0.0 --port 8888
[I 03:14:54.281 NotebookApp] Serving notebooks from local directory: /root
[I 03:14:54.281 NotebookApp] 0 active kernels
[I 03:14:54.281 NotebookApp] The Jupyter Notebook is running at: http://0.0.0.0:8888/
[I 03:14:54.281 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 03:14:54.282 NotebookApp] No web browser found: could not locate runnable browser.
[I 03:15:09.976 NotebookApp] 302 GET / (61.6.68.44) 1.21ms
[I 03:15:15.924 NotebookApp] Creating new …Run Code Online (Sandbox Code Playgroud) 我在Wikipedia主页上的https://www.wikipedia.org/上具有以下HTML 。我正在尝试获取href文本
//en.wikipedia.org/
<div class="central-featured-lang lang1" lang="en">
<a href="//en.wikipedia.org/" title="English — Wikipedia — The Free Encyclopedia" class="link-box">
<strong>English</strong><br>
<em>The Free Encyclopedia</em><br>
<small>5 077 000+ articles</small>
</a>
</div>
Run Code Online (Sandbox Code Playgroud)
我已经尝试过了,$$('.central-featured-lang.lang1 a[href$=".org/"]')但是我仍然得到了整个输出,而不仅仅是href文本。
[<a href=?"/?/?en.wikipedia.org/?" title=?"English — Wikipedia — The Free Encyclopedia" class=?"link-box">?…?</a>?<strong>?English?</strong>?<br>?<em>?The Free Encyclopedia?</em>?<br>?<small>?5 077 000+ articles?</small>?</a>?]
Run Code Online (Sandbox Code Playgroud)
任何建议深表感谢。
我不了解NLTK正则表达式解析语法如何工作。请参见下面。
parser = RegexpParser('''
NP: {<DT>? <JJ>* <NN>*} # NP
P: {<IN>} # Preposition
V: {<V.*>} # Verb
PP: {<P> <NP>} # PP -> P NP
VP: {<V> <NP|PP>*} # VP -> V (NP|PP)*
''')
Run Code Online (Sandbox Code Playgroud)
<DT>?*是什么意思。<V>.*和之间有什么区别<V.*>
谢谢