停止循环脚本以返回重复的条目

Question

停止循环脚本以返回重复的条目

AEA*_*AEA 0 python xml loops duplicates python-2.7

我有一些代码,当前打印出来自XML文件(从网站获得)的每个用户的数据,XML更新,因为更多的用户全天与它进行交互.我目前有我的代码循环每5分钟下载一次这个数据.

每次运行代码时,它都会生成一个用户及其统计信息列表,前5分钟打印用户:a,b,c

第二个5分钟它打印用户:a,b,c,d,e

第三个5分钟它打印用户:a,b,c,d,e,f,g

我需要代码才能打印前5分钟:a,b,c秒5分钟:d,e第三个5分钟:f,g

有些人如何认识到已经使用了一些用户.每个用户都有一个唯一的用户ID,我想可以匹配？

我附上了我的代码示例,如果有帮助的话.

import mechanize
import urllib
import json
import re
import random
import datetime
from sched import scheduler
from time import time, sleep

######Code to loop the script and set up scheduling time

s = scheduler(time, sleep)
random.seed()

def run_periodically(start, end, interval, func):
    event_time = start
    while event_time < end:
        s.enterabs(event_time, 0, func, ())
        event_time += interval + random.randrange(-5, 45)
    s.run()


###### Code to get the data required from the URL desired
def getData():  
    post_url = "URL OF INTEREST"
    browser = mechanize.Browser()
    browser.set_handle_robots(False)
    browser.addheaders = [('User-agent', 'Firefox')]

######These are the parameters you've got from checking with the aforementioned tools
    parameters = {'page' : '1',
              'rp' : '250',
              'sortname' : 'roi',
              'sortorder' : 'desc'
             }
#####Encode the parameters
    data = urllib.urlencode(parameters)
    trans_array = browser.open(post_url,data).read().decode('UTF-8')

    xmlload1 = json.loads(trans_array)
    pattern1 = re.compile('>&nbsp;&nbsp;(.*)<')
    pattern2 = re.compile('/control/profile/view/(.*)\' title=')
    pattern3 = re.compile('<span style=\'font-size:12px;\'>(.*)<\/span>')


#########################################################################
##### The request sent from here all the way down including comments#####
#########################################################################


##### Making the code identify each row, removing the need to numerically quantify the     number of rows in the xmlfile,
##### thus making number of rows dynamic (change as the list grows, required for looping function to work un interupted)

    for row in xmlload1['rows']:
        cell = row["cell"]

##### defining the Keys (key is the area from which data is pulled in the XML) for use in the pattern finding/regex

        user_delimiter = cell['username']
        selection_delimiter = cell['race_horse']


        if strikeratecalc2 < 12 : continue;

##### REMAINDER OF THE REGEX DELMITATIONS
        username_delimiter_results = re.findall(pattern1, user_delimiter)[0]
        userid_delimiter_results = (re.findall(pattern2, user_delimiter)[0])
        user_selection = re.findall(pattern3, selection_delimiter)[0]



##### Printing the results of the code at hand

        print "user id = ",userid_delimiter_results
        print "username = ",username_delimiter_results
        print "user selection = ",user_selection
        print ""





    getData()


    run_periodically(time()+5, time()+1000000, 3000, getData)

Run Code Online (Sandbox Code Playgroud)

Please be nice with comments, I have been coding for a cumulative 11 days now, so also excuse any major errors in the code I am using, although it is working so far.

Kind regards

AEA

Answer 1

knu*_*ole 5

我想你可以简单地将唯一的id存储在某个地方(比如文件或数据库 - Redis是我最喜欢的),然后检查它们.

对于存储Redis,您可以这样做:

# redis
import redis
pwd = 'l33t'
r = redis.StrictRedis(host='localhost', port=6379, db=1, password=pwd)  

# set id's
r.sadd('user_ids', unique_id) # this is a set, with no duplicates

# check for existing id's
r.sismember('user_ids', unique_id) # returns 1 or 0

Run Code Online (Sandbox Code Playgroud)

请参阅http://redis.io/commands#set和https://github.com/andymccurdy/redis-py.你需要两个,Redis并redis-py需要两分钟安装.

归档时间：	12 年，5 月前
查看次数：	113 次
最近记录：	12 年，5 月前