Python脚本从csv文件中读取

Hul*_*ulk 3 python csv

           "Type","Name","Description","Designation","First-term assessment","Second-term assessment","Total"
           "Subject","Nick","D1234","F4321",10,19,29
           "Unit","HTML","D1234-1","F4321",18,,
           "Topic","Tags","First Term","F4321",18,,
           "Subtopic","Review of representation of HTML",,,,,
Run Code Online (Sandbox Code Playgroud)

以上所有都是excel表中的值,它被转换为csv,就是上面显示的那个

你注意到的标题包含七个coulmns,它们下面的数据有所不同,

我有这个脚本从python脚本生成这些脚本,脚本如下

 from django.db import transaction
 import sys
 import csv
 import StringIO



 file = sys.argv[1]
 no_cols_flag=0
 flag=0
 header_arr=[]


 print file
 f = open(file, 'r')



while (f.readline() != ""):
  for i in [line.split(',') for line in open(file)]: # split on the separator
    print "==========================================================="
    row_flag=0
    row_d=""
    for j in i: # for each token in the split string
      row_flag=1
      print j


      if j:
        no_cols_flag=no_cols_flag+1
        data=j.strip()
        print j

    break
Run Code Online (Sandbox Code Playgroud)

如何修改上面的脚本,说这个数据属于一个特定的列标题..

谢谢..

Tim*_*ker 11

您正在导入csv模块但从不使用它.为什么?

如果你这样做

import csv
reader = csv.reader(open(file, "rb"), dialect="excel") # Python 2.x
# Python 3: reader = csv.reader(open(file, newline=""), dialect="excel")
Run Code Online (Sandbox Code Playgroud)

你会得到一个reader包含你所需要的东西; 第一行将包含标题,后续行将包含相应位置的数据.

更好的可能是(如果我理解正确的话):

import csv
reader = csv.DictReader(open(file, "rb"), dialect="excel") # Python 2.x
# Python 3: reader = csv.DictReader(open(file, newline=""), dialect="excel")
Run Code Online (Sandbox Code Playgroud)

DictReader可以迭代,返回一个dicts 序列,它使用列标题作为键,后面的数据作为值,所以

for row in reader:
    print(row)
Run Code Online (Sandbox Code Playgroud)

将输出

{'Name': 'Nick', 'Designation': 'F4321', 'Type': 'Subject', 'Total': '29', 'First-term assessment': '10', 'Second-term assessment': '19', 'Description': 'D1234'}
{'Name': 'HTML', 'Designation': 'F4321', 'Type': 'Unit', 'Total': '', 'First-term assessment': '18', 'Second-term assessment': '', 'Description': 'D1234-1'}
{'Name': 'Tags', 'Designation': 'F4321', 'Type': 'Topic', 'Total': '', 'First-term assessment': '18', 'Second-term assessment': '', 'Description': 'First Term'}
{'Name': 'Review of representation of HTML', 'Designation': '', 'Type': 'Subtopic', 'Total': '', 'First-term assessment': '', 'Second-term assessment': '', 'Description': ''}
Run Code Online (Sandbox Code Playgroud)

  • 在Python 2.x中,*ALWAYS*以二进制模式打开文件('rb'或'wb',视情况而定). (3认同)
  • @Tim:2.x docs http://docs.python.org/library/csv.html#csv.reader说:"如果csvfile是一个文件对象,它必须在平台上用'b'标志打开这有所不同." 即Windows平台.因此,对于平台独立性,应始终使用'rb'.即使文档没有这样说,写作也是如此.CRLF独立于平台终止CSV记录 - 它本质上是一种BINARY格式.如果您在Windows上不提供'wb',则会获得CRCRLF. (3认同)