pau*_*111 1 python csv nested-loops python-3.x
我是 python 的初学者,并尝试通过谷歌搜索找到解决方案。但是,我找不到任何我想要的解决方案。
我试图用 python 做的是对数据进行预处理,查找关键字并从大型 csv 文件中获取包含关键字的所有行。
不知何故,嵌套循环经历了just once,然后它就没有经历过second loop。
下面显示的代码是我的代码的一部分,它从文件中查找关键字csv并写入文本文件。
def main():
#Calling file (Directory should be changed)
data_file = 'dataset.json'
#Loading data.json file
with open(data_file, 'r') as fp:
data = json.load(fp)
#Make the list for keys
key_list = list(data.keys())
#print(key_list)
preprocess_txt = open("test_11.txt", "w+", -1, "utf-8")
support_fact = 0
for i, k in enumerate(key_list):
count = 1
#read csv, and split on "," the line
with open("my_csvfile.csv", 'r', encoding = 'utf-8') as csvfile:
reader = csv.reader(csvfile)
#The number of q_id is 2
#This is the part that the nested for loop doesn't work!!!!!!!!!!!!!!!!!!!!!!!!!!!!
if len(data[k]['Qids']) == 2:
print("Number 2")
for m in range(len(data[k]['Qids'])):
print(len(data[k]['Qids']))
q_id = [data[k]['Qids'][m]]
print(q_id)
for row in reader: #--->This nested for loop doesn't work after going through one loop!!!!!
if all([x in row for x in q_id]):
print("YES!!!")
preprocess_txt.write("%d %s %s %s\n" % (count, row[0], row[1], row[2]))
count += 1
Run Code Online (Sandbox Code Playgroud)
对于上述代码的详细信息,
首先,它从文件中提取所有密钥data.json,然后将这些密钥放入 list( key_list) 中。
其次,我使用all([x in row for x in q_id])方法检查包含关键字(q_id)的每一行。
但是,正如我在上面的代码中评论的那样,当 的长度data[k]['Qids']为 2 时,它会在第一个循环中正确打印出来,但不会在第二个循环中YES!!!打印出来,这意味着即使该 csv 文件包含关键词。YES!!!for row in reader
打印出来的图如下所示:
我做错了什么..?或者我应该为代码添加什么才能使其工作..?
有人可以帮我吗..?
感谢您的关注!
举例来说,假设我有一个 CSV 文件,如下所示:
食品.csv
beef,stew,apple,sauce
apple,pie,potato,salami
tomato,cherry,pie,bacon
Run Code Online (Sandbox Code Playgroud)
以下代码旨在模拟当前代码的结构:
def main():
import csv
keywords = ["apple", "pie"]
with open("foods.csv", "r") as file:
reader = csv.reader(file)
for keyword in keywords:
for row in reader:
if keyword in row:
print(f"{keyword} was in {row}")
print("Done")
main()
Run Code Online (Sandbox Code Playgroud)
期望的结果是,对于我的关键字列表中的每个关键字,如果该关键字存在于我的 CSV 文件中的某一行中,我将在屏幕上打印一个字符串 - 指示该关键字出现在哪一行。
然而,这是实际的输出:
apple was in ['beef', 'stew', 'apple', 'sauce']
apple was in ['apple', 'pie', 'potato', 'salami']
Done
>>>
Run Code Online (Sandbox Code Playgroud)
它能够apple在文件中找到该关键字的两个实例,但没有找到pie!那么,什么给出呢?
问题
句柄file(在您的情况下csvfile)产生一次其内容,然后它们被消耗。我们的reader对象环绕文件句柄并消耗其内容,直到它们耗尽为止,此时将不再有任何行可供从文件中读取(内部文件指针已前进到末尾),并且内部 for 循环将不再执行第二次。
解决方案
在外部 for 循环的每次迭代之后使用将内部文件指针移动到开头seek,或者将文件内容读取到列表或类似集合中,然后迭代列表:
更新的代码:
def main():
import csv
keywords = ["apple", "pie"]
with open("foods.csv", "r") as file:
contents = list(csv.reader(file))
for keyword in keywords:
for row in contents:
if keyword in row:
print(f"{keyword} was in {row}")
print("Done")
main()
Run Code Online (Sandbox Code Playgroud)
新输出:
apple was in ['beef', 'stew', 'apple', 'sauce']
apple was in ['apple', 'pie', 'potato', 'salami']
pie was in ['apple', 'pie', 'potato', 'salami']
pie was in ['tomato', 'cherry', 'pie', 'bacon']
Done
>>>
Run Code Online (Sandbox Code Playgroud)