删除python中字符串中不是字母的第一个字符后面的任何内容

use*_*ser 4 python regex string

关于使用正则表达式从字符串中剥离非字母数字字符有几个问题.我想要做的是删除第一个不是字母或单个空格的字符(包括数字和双空格)后的每个字符,包括字母.

例如:

My string is #not very beautiful 
Run Code Online (Sandbox Code Playgroud)

应该成为

My string is
Run Code Online (Sandbox Code Playgroud)

要么

Are you 9 years old?
Run Code Online (Sandbox Code Playgroud)

应该成为

Are you
Run Code Online (Sandbox Code Playgroud)

this is the last  example
Run Code Online (Sandbox Code Playgroud)

应该成为

this is the last
Run Code Online (Sandbox Code Playgroud)

我该如何做到这一点?

Psi*_*dom 5

如何split开始[^A-Za-z ]|并采取第一个元素?您可以稍后修剪可能的空白区域:

import re
re.split("[^A-Za-z ]|  ", "My string is #not very beautiful")[0].strip()
# 'My string is'

re.split("[^A-Za-z ]|  ", "this is the last  example")[0].strip()
# 'this is the last'

re.split("[^A-Za-z ]|  ", "Are you 9 years old?")[0].strip()
# 'Are you'
Run Code Online (Sandbox Code Playgroud)

[^A-Za-z ]|包含两种模式,第一种模式是单个字符,既不是字母也不是空格; 第二种模式是双白空间; 拆分这两种模式中的一种,拆分后的第一个元素应该是您正在寻找的.


ins*_*get 2

创建一个白名单,并在看到不在该白名单中的内容时停止:

import itertools
import string

def rstrip(s, whitelist=None):
    if whitelist is None:
        whitelist = set(string.ascii_letters + ' ')  # set the whitelist to a default of all letters A-Z and a-z and a space
    # split on double-whitespace and take the first split (this will work even if there's no double-whitespace in the string)
    # use `itertools.takewhile` to include the characters that in the whitelist
    # use `join` to join them inot one single string

    return ''.join(itertools.takewhile(whitelist.__contains__, s.split('  ', 1)[0]))
Run Code Online (Sandbox Code Playgroud)