使用Python将句子列表写入csv中的单个列

jxn*_*jxn 5 python csv nltk stop-words

我开始使用CSV文件,其中包含一列和多行,每行包含一个句子.我写了一些python来删除停用词,并生成一个具有相同格式的新csv文件(1列多行句子,但现在句子中删除了停用词.)我的代码中唯一不起作用的部分是写入新的csv.

我没有将一个句子写入一列,而是有多列,其中一列中的每一行都包含一个句子的字符.

这是我的new_text_list的一个例子:

['"Although online site asset business, still essential need reliable dependable web hosting provider. When searching suitable web host website, one name recommend. Choose plan that\'s Best Business Today! Try Now FREE 30 Days! Track sales expenses \x82"', 
'"Although online site asset business, still essential need reliable dependable web hosting provider. When searching suitable web host website, one name recommend. Choose plan that\'s Best Business Today! Try Now FREE 30 Days! Track sales expenses \x82"']
Run Code Online (Sandbox Code Playgroud)

以下是输出csv的示例:

col1 col2
"      W
W      e
"      W
W      e
l
l
Run Code Online (Sandbox Code Playgroud)

我究竟做错了什么?

这是我的代码:

def remove_stopwords(filename):
  new_text_list=[]
  cachedStopWords = set(stopwords.words("english"))
  with open(filename,"rU") as f:
    next(f)
    for line in f:
      row = line.split()
      text = ' '.join([word for word in row
                             if word not in cachedStopWords])
      # print text
      new_text_list.append(text)
  print new_text_list

  with open("output.csv",'wb') as g:
    writer=csv.writer(g)
    for val in new_text_list:
      writer.writerows([val])
Run Code Online (Sandbox Code Playgroud)

unu*_*tbu 4

with open("output.csv", 'wb') as g:
    writer = csv.writer(g)
    for item in new_text_list:
        writer.writerow([item])  # writerow (singular), not writerows (plural)
Run Code Online (Sandbox Code Playgroud)

或者

with open("output.csv", 'wb') as g:
    writer = csv.writer(g)
    writer.writerows([[item] for item in new_text_list])
Run Code Online (Sandbox Code Playgroud)

当您使用 时writerows,参数应该是行的迭代器,其中每行都是字段值的迭代器。这里,字段值为item。所以一行可以是列表,[item]。因此,writerows可以将列表的列表作为其参数。

writer.writerows([val])
Run Code Online (Sandbox Code Playgroud)

不起作用,因为[val]它只是一个包含字符串的列表,而不是列表的列表。

现在字符串也是序列——字符序列:

In [164]: list('abc')
Out[164]: ['a', 'b', 'c']
Run Code Online (Sandbox Code Playgroud)

所以writerows被认为是一个包含,[val]的列表。每个字符代表一个字段值。所以你的字符串中的字符被弄乱了。例如,rowval

import csv
with open('/tmp/out', 'wb') as f:
    writer = csv.writer(f)
    writer.writerows(['Hi, there'])
Run Code Online (Sandbox Code Playgroud)

产量

H,i,",", ,t,h,e,r,e
Run Code Online (Sandbox Code Playgroud)