oas*_*wla 0 python nlp python-3.x pandas
我的数据框有数千行。
它看起来像这样:
import pandas as pd
import numpy as np
text = ['please send us a dm...','…could you please dm me','dm me plz…','i dmed u yesterday…','dm me asap thx', 'i send a dm to u now', 'thx u r so nice dming u now', 'just sent u a dm']
df = pd.DataFrame({"text": text})
text
0 please send us a dm...
1 …could you please dm me
2 dm me plz…
3 i dmed u yesterday…
4 dm me asap thx
5 i send a dm to u now
6 thx u r so nice dming u now
7 just sent u a dm
Run Code Online (Sandbox Code Playgroud)
我编写了一个函数来替换“文本”列中的缩写。
def convert(dataframe, column):
dataframe[column] = dataframe[column].apply(lambda x: x.replace(" dm ", " direct message "))
dataframe[column] = dataframe[column].apply(lambda x: x.replace(" dming ", " direct message "))
dataframe[column] = dataframe[column].apply(lambda x: x.replace(" dmed ", " direct message "))
dataframe[column] = dataframe[column].apply(lambda x: x.replace(" plz ", " please "))
dataframe[column] = dataframe[column].apply(lambda x: x.replace(" thx ", " thanks "))
dataframe[column] = dataframe[column].apply(lambda x: x.replace(" u ", " you "))
dataframe[column] = dataframe[column].apply(lambda x: x.replace(" asap ", " as soon as possible "))
dataframe[column] = dataframe[column].apply(lambda x: x.replace("...", " "))
dataframe[column] = dataframe[column].apply(lambda x: x.replace("…", " "))
Run Code Online (Sandbox Code Playgroud)
但是,我的代码无法正常工作,因此无法完全替换数据框中的所有缩写。
convert(df, 'text')
text
0 please send us a dm
1 could you please direct message me
2 dm me plz
3 i direct message you yesterday
4 dm me as soon as possible thx
5 i send a direct message to you now
6 thx you r so nice direct message you now
7 just sent you a dm
Run Code Online (Sandbox Code Playgroud)
所需的最终输出如下所示:
text
0 please send us a direct message
1 could you please direct message me
2 direct message me plz
3 i direct message you yesterday
4 direct message me as soon as possible thanks
5 i send a direct message to you now
6 thanks you r so nice direct message you now
7 just sent you a direct message
Run Code Online (Sandbox Code Playgroud)
我不明白为什么我的代码不起作用。
首先构建一个替换字典:
replacers = {'dm': 'direct message',
'thx': 'thanks',
'dming': 'direct messaging',
'dmed': 'direct messaged',
'plz': 'please',
'u': 'you',
'asap': 'as soon as possible',
'...': '',
'. . .': '',
'r': 'are'}
Run Code Online (Sandbox Code Playgroud)
然后使用应用功能用所需的单词替换缩写。最后将单词合并回一个完整的字符串。
(
df.text.str.replace('[...…]','')
.str.split()
.apply(lambda x: ' '.join([replacers.get(e, e) for e in x]))
)
0 please send us a direct message
1 could you please direct message me
2 direct message me please
3 i direct messaged you yesterday
4 direct message me as soon as possible thanks
5 i send a direct message to you now
6 thanks you are so nice direct messaging you now
7 just sent you a direct message
Name: text, dtype: object
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3843 次 |
| 最近记录: |