re.sub错误与"预期字符串或字节类对象"

ima*_*oob 41 python regex nltk pandas

我已阅读有关此错误的多篇帖子,但我仍然无法弄明白.当我尝试循环我的函数时:

def fix_Plan(location):
    letters_only = re.sub("[^a-zA-Z]",  # Search for all non-letters
                          " ",          # Replace all non-letters with spaces
                          location)     # Column and row to search    

    words = letters_only.lower().split()     
    stops = set(stopwords.words("english"))      
    meaningful_words = [w for w in words if not w in stops]      
    return (" ".join(meaningful_words))    

col_Plan = fix_Plan(train["Plan"][0])    
num_responses = train["Plan"].size    
clean_Plan_responses = []

for i in range(0,num_responses):
    clean_Plan_responses.append(fix_Plan(train["Plan"][i]))
Run Code Online (Sandbox Code Playgroud)

这是错误:

Traceback (most recent call last):
  File "C:/Users/xxxxx/PycharmProjects/tronc/tronc2.py", line 48, in <module>
    clean_Plan_responses.append(fix_Plan(train["Plan"][i]))
  File "C:/Users/xxxxx/PycharmProjects/tronc/tronc2.py", line 22, in fix_Plan
    location)  # Column and row to search
  File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36\lib\re.py", line 191, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
Run Code Online (Sandbox Code Playgroud)

abc*_*ccd 55

正如你在评论中所说,一些值似乎是浮点数,而不是字符串.在传递给字符串之前,您需要将其更改为字符串re.sub.最简单的方法是在使用时更改location为.即使它已经是一个也不会有任何影响.str(location)re.substr

letters_only = re.sub("[^a-zA-Z]",  # Search for all non-letters
                          " ",          # Replace all non-letters with spaces
                          str(location))
Run Code Online (Sandbox Code Playgroud)

  • 我在 Jupyter 上写了两本笔记本,在 Kaggle Kernels 上写了一本。Jupyter one 工作正常并产生正确的输出。Kaggle Notebook 给了我一个错误,我遵循了你的解决方案,错误被删除了,但现在情绪预测结果是错误的。 (2认同)

小智 14

最简单的解决方案是应用Python str函数应用于您要循环遍历的列。

如果您正在使用pandas,这可以实现为:

dataframe['column_name']=dataframe['column_name'].apply(str)
Run Code Online (Sandbox Code Playgroud)

  • 我建议用 '' `dataframe['column_name'] = dataframe['column_name'].fillna('').apply(str)` 填充 nan 值,因为在大多数用例中,人们不希望 nan 是字面值 'nan ' (3认同)