我有一个很大的csv,文件,我想获取其中的所有值,这些值存储在我知道名称的特定列中。
不知何故我不知道该怎么做,但我想我很接近:(
import codecs
import csv
import json
import pprint
import re
FIELDS = ["name", "timeZone_label", "utcOffset", "homepage","governmentType_label", "isPartOf_label", "areaCode", "populationTotal",
"elevation", "maximumElevation", "minimumElevation", "populationDensity", "wgs84_pos#lat", "wgs84_pos#long",
"areaLand", "areaMetro", "areaUrban"]
index=[]
with open('/Users/stephan/Desktop/cities.csv', "r") as f:
mycsv=csv.reader(f)
results=[]
headers=None
for row in mycsv:
for i, col in enumerate(row):
if col in FIELDS:
index.append(i)
print row[i]
print index
Run Code Online (Sandbox Code Playgroud)
我的列表索引,我认为是正确的,并为我提供了正确的值(列索引)
我必须在代码中添加些什么才能使其正常工作?
我想编写一个简单的查询,为我提供拥有最多关注者、时区为巴西且已发推文 100 次或以上的用户:
这是我的台词:
pipeline = [{'$match':{"user.statuses_count":{"$gt":99},"user.time_zone":"Brasilia"}},
{"$group":{"_id": "$user.followers_count","count" :{"$sum":1}}},
{"$sort":{"count":-1}} ]
Run Code Online (Sandbox Code Playgroud)
我根据练习题改编了它。
This was given as an example for the structure :
{
"_id" : ObjectId("5304e2e3cc9e684aa98bef97"),
"text" : "First week of school is over :P",
"in_reply_to_status_id" : null,
"retweet_count" : null,
"contributors" : null,
"created_at" : "Thu Sep 02 18:11:25 +0000 2010",
"geo" : null,
"source" : "web",
"coordinates" : null,
"in_reply_to_screen_name" : null,
"truncated" : false,
"entities" : {
"user_mentions" : [ ],
"urls" : [ ],
"hashtags" …Run Code Online (Sandbox Code Playgroud)