小编ver*_*ley的帖子

正则表达式为匹配的字符串添加字符

我有一个很长的字符串,这是一个段落,但是在句号之后没有空格.例如:

para = "I saw this film about 20 years ago and remember it as being particularly nasty. I believe it is based on a true incident: a young man breaks into a nurses\' home and rapes, tortures and kills various women.It is in black and white but saves the colour for one shocking shot.At the end the film seems to be trying to make some political statement but it just comes across as confused and obscene.Avoid."
Run Code Online (Sandbox Code Playgroud)

我试图使用re.sub来解决这个问题,但输出不是我的预期.

这就是我做的:

re.sub("(?<=\.).", …
Run Code Online (Sandbox Code Playgroud)

python regex nlp

8
推荐指数
1
解决办法
4850
查看次数

如何计算pyspark数据框中一列中每个分类变量的频率?

假设我有一个pyspark数据框:

df.show()
+-----+---+
|  x  |  y|
+-----+---+
|alpha|  1|
|beta |  2|
|gamma|  1|
|alpha|  2|
+-----+---+
Run Code Online (Sandbox Code Playgroud)

我想计算发生的次数alphabeta并且gamma在栏中有x。如何在pyspark中做到这一点?

python pyspark spark-dataframe

1
推荐指数
1
解决办法
2946
查看次数

标签 统计

python ×2

nlp ×1

pyspark ×1

regex ×1

spark-dataframe ×1