Python:如何捕获输出到文本文件?(现在只捕获530条线路中的25条)

Joh*_*dun 4 python nltk python-2.7

我已经做了相当多的潜伏在SO和相当多的搜索和阅读,但我必须承认在编程一般是一个相对的菜鸟.我正在努力学习,所以我一直在玩Python的NLTK.在下面的脚本中,我可以让一切工作,除了它只写出多屏幕输出的第一个屏幕,至少我正在考虑它.

这是脚本:

#! /usr/bin/env python

import nltk

# First we have to open and read the file:

thefile = open('all_no_id.txt')
raw = thefile.read()

# Second we have to process it with nltk functions to do what we want

tokens = nltk.wordpunct_tokenize(raw)
text = nltk.Text(tokens)

# Now we can actually do stuff with it:

concord = text.concordance("cultural")

# Now to save this to a file

fileconcord = open('ccord-cultural.txt', 'w')
fileconcord.writelines(concord)
fileconcord.close()
Run Code Online (Sandbox Code Playgroud)

这是输出文件的开头:

Building index...
Displaying 25 of 530 matches:
y .   The Baobab Tree : Stories of Cultural Continuity The continuity evident 
 regardless of ethnicity , and the cultural legacy of Africa as well . This Af
Run Code Online (Sandbox Code Playgroud)

我在这里缺少什么来将整个530匹配写入文件?

bez*_*max 5

text.concordance(self, word, width=79, lines=25)根据手册似乎有其他参数.

我认为无法提取索引索引的大小,但是,索引打印代码似乎有这一部分:lines = min(lines, len(offsets))因此您可以简单地传递sys.maxint作为最后一个参数:

concord = text.concordance("cultural", 75, sys.maxint)
Run Code Online (Sandbox Code Playgroud)

添加:

现在看着原始代码,我看不出它以前的工作方式.text.concordance不返回任何东西,但一切都输出到stdout使用print.因此,easy选项可以将stdout重定向到您的文件,如下所示:

import sys

....

# Open the file
fileconcord = open('ccord-cultural.txt', 'w')
# Save old stdout stream
tmpout = sys.stdout
# Redirect all "print" calls to that file
sys.stdout = fileconcord
# Init the method
text.concordance("cultural", 200, sys.maxint)
# Close file
fileconcord.close()
# Reset stdout in case you need something else to print
sys.stdout = tmpout
Run Code Online (Sandbox Code Playgroud)

另一种选择是直接使用相应的类并省略Text包装器.只需从这里复制位并将它们与来自这里的位组合起来就可以了.