我有一个很长的字符串,这是一个段落,但是在句号之后没有空格.例如:
para = "I saw this film about 20 years ago and remember it as being particularly nasty. I believe it is based on a true incident: a young man breaks into a nurses\' home and rapes, tortures and kills various women.It is in black and white but saves the colour for one shocking shot.At the end the film seems to be trying to make some political statement but it just comes across as confused and obscene.Avoid."
Run Code Online (Sandbox Code Playgroud)
我试图使用re.sub来解决这个问题,但输出不是我的预期.
这就是我做的:
re.sub("(?<=\.).", …Run Code Online (Sandbox Code Playgroud) 假设我有一个pyspark数据框:
df.show()
+-----+---+
| x | y|
+-----+---+
|alpha| 1|
|beta | 2|
|gamma| 1|
|alpha| 2|
+-----+---+
Run Code Online (Sandbox Code Playgroud)
我想计算发生的次数alpha,beta并且gamma在栏中有x。如何在pyspark中做到这一点?