相关疑难解决方法(0)

为什么在处理DataFrame时我的NLTK功能会变慢？

我试图在数据集中运行我的百万行的函数.

我在数据帧中读取CSV中的数据
我使用drop list删除我不需要的数据
我在for循环中通过NLTK函数传递它.

码:

def nlkt(val):
    val=repr(val)
    clean_txt = [word for word in val.split() if word.lower() not in stopwords.words('english')]
    nopunc = [char for char in str(clean_txt) if char not in string.punctuation]
    nonum = [char for char in nopunc if not char.isdigit()]
    words_string = ''.join(nonum)
    return words_string

Run Code Online (Sandbox Code Playgroud)

现在我使用for循环调用上述函数来运行百万条记录.即使我在24核CPU和88 GB Ram的重量级服务器上,我看到循环花费了太多时间而没有使用那里的计算能力

我这样调用上面的函数

data = pd.read_excel(scrPath + "UserData_Full.xlsx", encoding='utf-8')
droplist = ['Submitter', 'Environment']
data.drop(droplist,axis=1,inplace=True)

#Merging the columns company and detailed description

data['Anylize_Text']= data['Company'].astype(str) + ' ' + data['Detailed_Description'].astype(str)

finallist =[]

for …

Run Code Online (Sandbox Code Playgroud)

python optimization nltk

Ani*_*jee

2017 12-13

0
推荐指数

1
解决办法

1922
查看次数

标签统计

nltk ×1

optimization ×1

python ×1

为什么在处理DataFrame时我的NLTK功能会变慢？

标签 统计

标签统计