从熊猫数据框中删除句子长度超过特定单词长度的行

Ash*_*'Sa 6 python string split pandas

我想从熊猫数据框中删除行,其中包含来自特定列的字符串,该列的长度大于所需的长度。

例如:

输入框:

X    Y
0    Hi how are you.
1    An apple
2    glass of water
3    I like to watch movie
Run Code Online (Sandbox Code Playgroud)

现在,说我想从数据帧中删除具有长度大于或等于4的单词字符串的行。

所需的输出帧必须是:

X    Y
1    An apple
2    glass of water
Run Code Online (Sandbox Code Playgroud)

删除“ X”列中值为0.3的行,因为第0列中的单词数为4,第3列为5。

jez*_*ael 6

First split values by whitespace, get number of rows by Series.str.len and check by inverted condition >= to < with Series.lt for boolean indexing:

df = df[df['Y'].str.split().str.len().lt(4)]
#alternative with inverted mask by ~
#df = df[~df['Y'].str.split().str.len().ge(4)]
print (df)
   X               Y
1  1        An apple
2  2  glass of water
Run Code Online (Sandbox Code Playgroud)