读取同一个 csv 文件时,嵌套 for 循环在 python 中不起作用

pau*_*111 1 python csv nested-loops python-3.x

我是 python 的初学者,并尝试通过谷歌搜索找到解决方案。但是,我找不到任何我想要的解决方案。

我试图用 python 做的是对数据进行预处理,查找关键字并从大型 csv 文件中获取包含关键字的所有行。

不知何故,嵌套循环经历了just once,然后它就没有经历过second loop

下面显示的代码是我的代码的一部分,它从文件中查找关键字csv并写入文本文件。

def main():
   #Calling file (Directory should be changed)
   data_file = 'dataset.json'
   #Loading data.json file
   with open(data_file, 'r') as fp:
       data = json.load(fp)

       #Make the list for keys    
       key_list = list(data.keys())
       #print(key_list)
   preprocess_txt = open("test_11.txt", "w+", -1, "utf-8")
   support_fact = 0

   for i, k in enumerate(key_list):
       count = 1
       #read csv, and split on "," the line  
       with open("my_csvfile.csv", 'r', encoding = 'utf-8') as csvfile:
           reader = csv.reader(csvfile)
           #The number of q_id is 2
           #This is the part that the nested for loop doesn't work!!!!!!!!!!!!!!!!!!!!!!!!!!!!
           if len(data[k]['Qids']) == 2:
               print("Number 2")
               for m in range(len(data[k]['Qids'])):
                   print(len(data[k]['Qids']))
                   q_id = [data[k]['Qids'][m]]
                   print(q_id)
                   for row in reader: #--->This nested for loop doesn't work after going through one loop!!!!!
                       if all([x in row for x in q_id]):
                           print("YES!!!")
                           preprocess_txt.write("%d %s %s %s\n" % (count, row[0], row[1], row[2]))
                               count += 1
Run Code Online (Sandbox Code Playgroud)

对于上述代码的详细信息,

首先,它从文件中提取所有密钥data.json,然后将这些密钥放入 list( key_list) 中。

其次,我使用all([x in row for x in q_id])方法检查包含关键字(q_id)的每一行。

但是,正如我在上面的代码中评论的那样,当 的长度data[k]['Qids']为 2 时,它会在第一个循环中正确打印出来,但不会在第二个循环中YES!!!打印出来,这意味着即使该 csv 文件包含关键词。YES!!!for row in reader

打印出来的图如下所示:

for循环的输出

我做错了什么..?或者我应该为代码添加什么才能使其工作..?

有人可以帮我吗..?

感谢您的关注!

use*_*432 5

举例来说,假设我有一个 CSV 文件,如下所示:

食品.csv

beef,stew,apple,sauce
apple,pie,potato,salami
tomato,cherry,pie,bacon
Run Code Online (Sandbox Code Playgroud)

以下代码旨在模拟当前代码的结构:

def main():
    import csv

    keywords = ["apple", "pie"]

    with open("foods.csv", "r") as file:
        reader = csv.reader(file)

        for keyword in keywords:
            for row in reader:
                if keyword in row:
                    print(f"{keyword} was in {row}")

        print("Done")

main()
Run Code Online (Sandbox Code Playgroud)

期望的结果是,对于我的关键字列表中的每个关键字,如果该关键字存在于我的 CSV 文件中的某一行中,我将在屏幕上打印一个字符串 - 指示该关键字出现在哪一行。

然而,这是实际的输出:

apple was in ['beef', 'stew', 'apple', 'sauce']
apple was in ['apple', 'pie', 'potato', 'salami']
Done
>>> 
Run Code Online (Sandbox Code Playgroud)

它能够apple在文件中找到该关键字的两个实例,但没有找到pie!那么,什么给出呢?

问题

句柄file(在您的情况下csvfile)产生一次其内容,然后它们被消耗。我们的reader对象环绕文件句柄并消耗其内容,直到它们耗尽为止,此时将不再有任何行可供从文件中读取(内部文件指针已前进到末尾),并且内部 for 循环将不再执行第二次。

解决方案

在外部 for 循环的每次迭代之后使用将内部文件指针移动到开头seek,或者将文件内容读取到列表或类似集合中,然后迭代列表:

更新的代码:

def main():
    import csv

    keywords = ["apple", "pie"]

    with open("foods.csv", "r") as file:
        contents = list(csv.reader(file))

        for keyword in keywords:
            for row in contents:
                if keyword in row:
                    print(f"{keyword} was in {row}")

        print("Done")

main()
Run Code Online (Sandbox Code Playgroud)

新输出:

apple was in ['beef', 'stew', 'apple', 'sauce']
apple was in ['apple', 'pie', 'potato', 'salami']
pie was in ['apple', 'pie', 'potato', 'salami']
pie was in ['tomato', 'cherry', 'pie', 'bacon']
Done
>>> 
Run Code Online (Sandbox Code Playgroud)