小编Roz*_*ita的帖子

删除标点符号在Scala - Spark中形成文本

这是我的数据的一个示例:

case time (especially it's purse), read manual care, follow care instructions make stays waterproof -- example, inspect rubber seals doors (especially battery/memory card door open time) 
xm "life support" picture . flip part bit flimsy guessing won't long . sound great altec speaker dock it! chance back base (xm3020) . traveling bag connect laptop extra speaker . amount paid ($25).
Run Code Online (Sandbox Code Playgroud)

我想删除除点(.)之外的所有标点符号,并删除单词length < = 2,例如我的预期输出是:

case time especially its purse read manual care follow care instructions . make …
Run Code Online (Sandbox Code Playgroud)

regex scala punctuation apache-spark

7
推荐指数
1
解决办法
2万
查看次数

Scala和Spark中文本词形还原的最简单方法

我想在文本文件中使用词形还原:

surprise heard thump opened door small seedy man clasping package wrapped.

upgrading system found review spring 2008 issue moody audio backed.

omg left gotta wrap review order asap . understand hand delivered dali lama

speak hands wear earplugs lives . listen maintain link long .

cables cables finally able hear gem long rumored music .
...
Run Code Online (Sandbox Code Playgroud)

和预期产量是:

surprise heard thump open door small seed man clasp package wrap.

upgrade system found review spring 2008 issue mood audio back.

omg …
Run Code Online (Sandbox Code Playgroud)

text scala lemmatization apache-spark databricks

6
推荐指数
1
解决办法
4974
查看次数

如何从Scala中的文本中删除数字?

如何从Scala中删除数字形式的文本?

例如,我有这样的文字:

canon 40 22mm lens lock strength plenty orientation 321 .
Run Code Online (Sandbox Code Playgroud)

删除后:

canon lens lock strength plenty orientation .
Run Code Online (Sandbox Code Playgroud)

regex text scala apache-spark

3
推荐指数
1
解决办法
2678
查看次数