如何解决python中属性错误“ float”对象没有属性“ split”?

Sch*_*ool 2 python string series pandas

当我运行下面的这些代码时,它给我错误,说有属性错误“ float”对象在python中没有属性“ split”。

我想知道为什么会出现此错误,请帮助我查看下面的代码,谢谢:((

pd.options.display.max_colwidth = 10000
df = pd.read_csv(output, sep='|')


def text_processing(df):
    """""=== Lower case ==="""
    '''First step is to transform comments into lower case'''
    df['content'] = df['content'].apply(lambda x: " ".join(x.lower() for x in x.split() if x not in stop_words))

    '''=== Removal of stop words ==='''
    df['content'] = df['content'].apply(lambda x: " ".join(x for x in x.split() if x not in stop_words))

    '''=== Removal of Punctuation ==='''
    df['content'] = df['content'].str.replace('[^\w\s]', '')

    '''=== Removal of Numeric ==='''
    df['content'] = df['content'].str.replace('[0-9]', '')

    '''=== Removal of common words ==='''
    freq = pd.Series(' '.join(df['content']).split()).value_counts()[:5]
    freq = list(freq.index)
    df['content'] = df['content'].apply(lambda x: " ".join(x for x in x.split() if x not in freq))

    '''=== Removal of rare words ==='''
    freq = pd.Series(' '.join(df['content']).split()).value_counts()[-5:]
    freq = list(freq.index)
    df['content'] = df['content'].apply(lambda x: " ".join(x for x in x.split() if x not in freq))

    return df

df = text_processing(df)
print(df)
Run Code Online (Sandbox Code Playgroud)

错误的输出:

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.2\helpers\pydev\pydevd.py", line 1664, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.2\helpers\pydev\pydevd.py", line 1658, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.2\helpers\pydev\pydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/L31307/Documents/FYP P3_Lynn_161015H/FYP 10.10.18 (Wed) still working on it/FYP/dataanalysis/category_analysis.py", line 53, in <module>
    df = text_processing(df)
  File "C:/Users/L31307/Documents/FYP P3_Lynn_161015H/FYP 10.10.18 (Wed) still working on it/FYP/dataanalysis/category_analysis.py", line 30, in text_processing
    df['content'] = df['content'].apply(lambda x: " ".join(x.lower() for x in x.split() if x not in stop_words))
  File "C:\Users\L31307\AppData\Roaming\Python\Python37\site-packages\pandas\core\series.py", line 3194, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/src\inference.pyx", line 1472, in pandas._libs.lib.map_infer
  File "C:/Users/L31307/Documents/FYP P3_Lynn_161015H/FYP 10.10.18 (Wed) still working on it/FYP/dataanalysis/category_analysis.py", line 30, in <lambda>
    df['content'] = df['content'].apply(lambda x: " ".join(x.lower() for x in x.split() if x not in stop_words))
AttributeError: 'float' object has no attribute 'split'
Run Code Online (Sandbox Code Playgroud)

Dom*_*aul 8

split() 是一个Python 方法,仅适用于字符串。看来您的列“内容”不仅包含字符串,还包含其他值,例如无法应用 .split() 方法的浮点数。

尝试使用 str(x).split() 将值转换为字符串或首先将整个列转换为字符串,这会更有效。您可以按如下方式执行此操作:

df['column_name'].astype(str)
Run Code Online (Sandbox Code Playgroud)


jpp*_*jpp 7

错误指向此行:

df['content'] = df['content'].apply(lambda x: " ".join(x.lower() for x in x.split() \
                                    if x not in stop_words))
Run Code Online (Sandbox Code Playgroud)

split在这里用作Python内置str类的方法。您的错误表明中的一个或多个值df['content']是类型float。这可能是因为存在空值(即)NaN或非空浮点值。

一种解决方法,可以对浮点数进行字符串化,只适用strx使用split

df['content'] = df['content'].apply(lambda x: " ".join(x.lower() for x in str(x).split() \
                                    if x not in stop_words))
Run Code Online (Sandbox Code Playgroud)

另外,可能是一个更好的解决方案,应明确使用带有try/ except子句的命名函数:

def converter(x):
    try:
        return ' '.join([x.lower() for x in str(x).split() if x not in stop_words])
    except AttributeError:
        return None  # or some other value

df['content'] = df['content'].apply(converter)
Run Code Online (Sandbox Code Playgroud)

由于pd.Series.apply只是一个循环而产生的开销,因此您可能会发现列表理解或map更有效:

df['content'] = [converter(x) for x in df['content']]
df['content'] = list(map(converter, df['content']))
Run Code Online (Sandbox Code Playgroud)