在我们的系统中,9 个进程同时写入相同的 CSV 输出。而且输出速度快。每天大约有 1000 万个新行。为了编写CSV文件,我们使用Python2.7的csv模块。
最近我注意到 CSV 文件中有一些混合行(参见下面的示例)。
例如
"name", "sex", "country", "email"
...# skip some lines
"qi", "Male", "China", "redice
...# skip some lines
"Jamp", "Male", "China", "jamp@site-digger.com"
...# skip some lines
@163.com"
Run Code Online (Sandbox Code Playgroud)
正确的输出应该是:
"name", "sex", "country", "email"
...# skip some lines
"qi", "Male", "China", "redice@163.com"
...# skip some lines
"Jamp", "Male", "China", "jamp@site-digger.com"
...
Run Code Online (Sandbox Code Playgroud)
如何避免这样的冲突呢?
我以前在Windows上使用过tortoisehg,GUI非常友好.
现在我继续使用Ubuntu(11.10),我使用以下命令安装它:
apt-get install mercurial python-nautilus tortoisehg
Run Code Online (Sandbox Code Playgroud)
安装后,我可以使用hg命令,但我不知道如何启动GUI(右键菜单中根本没有相关项目).
PS:我正在使用root帐号.
可能重复:
什么是加等号(+ =)做在Python?
我注意到一个奇怪的问题:
l1 = ['1', '2', '3']
l2 = l1
item = l2.pop(0)
# the pop operation will effect l1
print l1
l2 = l2 + [item]
# why "l2 = l2 + [item]" does't effect l1 while "l2 += [item]" does.
print l2
print l1
Run Code Online (Sandbox Code Playgroud)
输出是:
['2', '3']
['2', '3', '1']
['2', '3']
Run Code Online (Sandbox Code Playgroud)
但是,如果我改变l2 = l2 + [item]成l2 += [item],输出将是:
['2', '3']
['2', '3', '1']
['2', '3', '1']
Run Code Online (Sandbox Code Playgroud) # test.py
import threading
import time
import random
from itertools import count
def fib(n):
"""fibonacci sequence
"""
if n < 2:
return n
else:
return fib(n - 1) + fib(n - 2)
if __name__ == '__main__':
counter = count(1)
start_time = time.time()
def thread_worker():
while True:
try:
# To simulate downloading
time.sleep(random.randint(5, 10))
# To simulate doing some process, will take about 0.14 ~ 0.63 second
fib(n=random.randint(28, 31))
finally:
finished_number = counter.next()
print 'Has finished %d, the average speed is …Run Code Online (Sandbox Code Playgroud)