这是代码!
import csv
def do_work():
global data
global b
get_file()
samples_subset1()
return
def get_file():
start_file='thefile.csv'
with open(start_file, 'rb') as f:
data = list(csv.reader(f))
import collections
counter = collections.defaultdict(int)
for row in data:
counter[row[10]] += 1
return
def samples_subset1():
with open('/pythonwork/samples_subset1.csv', 'wb') as outfile:
writer = csv.writer(outfile)
sample_cutoff=5000
b_counter=0
global b
b=[]
for row in data:
if counter[row[10]] >= sample_cutoff:
global b
b.append(row)
writer.writerow(row)
#print b[b_counter]
b_counter+=1
return
Run Code Online (Sandbox Code Playgroud)
我是python的初学者.我的代码运行的方式是我调用do_work,do_Work将调用其他函数.这是我的问题:
如果我需要data只被2个函数看到我应该让它全球化吗?如果不是那么我应该如何打电话samples_subset1?我应该来自get_file还是来自do_work?
代码有效,但是你可以指出其编写方式的其他好/坏事吗?
我正在处理一个csv文件,有多个步骤.我正在将步骤分解为不同的功能,例如get_file,samples_subset1还有更多我将添加的功能.我应该继续这样做我现在这样做我在这里打电话给每个人的功能do_work吗?
这是新代码,根据下面的答案之一:
import csv
import collections
def do_work():
global b
(data,counter)=get_file('thefile.csv')
samples_subset1(data, counter,'/pythonwork/samples_subset1.csv')
return
def get_file(start_file):
with open(start_file, 'rb') as f:
global data
data = list(csv.reader(f))
counter = collections.defaultdict(int)
for row in data:
counter[row[10]] += 1
return (data,counter)
def samples_subset1(data,counter,output_file):
with open(output_file, 'wb') as outfile:
writer = csv.writer(outfile)
sample_cutoff=5000
b_counter=0
global b
b=[]
for row in data:
if counter[row[10]] >= sample_cutoff:
global b
b.append(row)
writer.writerow(row)
#print b[b_counter]
b_counter+=1
return
Run Code Online (Sandbox Code Playgroud)
根据经验,避免全局变量.
这里很简单:让get_file返回数据,然后你可以说
data = get_file()
samples_subset1(data)
Run Code Online (Sandbox Code Playgroud)
另外,我会在文件顶部执行所有导入操作