tweepy Streaming API:全文

Var*_*gar 4 twitter text-mining tweepy

我正在使用tweepy流API来获取包含特定主题标签的推文.我面临的问题是我无法从Streaming API中提取推文的全文.只有140个字符可用,之后会被截断.

这是代码:

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)

def analyze_status(text):

if 'RT' in text[0:3]:
    return True
else:
    return False

class MyStreamListener(tweepy.StreamListener):

def on_status(self, status):

if not analyze_status(status.text) :

    with open('fetched_tweets.txt','a') as tf:
        tf.write(status.text.encode('utf-8') + '\n\n')

    print(status.text)

def on_error(self, status):
print("Error Code : " + status)

def test_rate_limit(api, wait=True, buffer=.1):
    """
    Tests whether the rate limit of the last request has been reached.
    :param api: The `tweepy` api instance.
    :param wait: A flag indicating whether to wait for the rate limit reset
             if the rate limit has been reached.
    :param buffer: A buffer time in seconds that is added on to the waiting
               time as an extra safety margin.
    :return: True if it is ok to proceed with the next request. False otherwise.
    """
    #Get the number of remaining requests
    remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
    #Check if we have reached the limit
    if remaining == 0:
    limit = int(api.last_response.getheader('x-rate-limit-limit'))
    reset = int(api.last_response.getheader('x-rate-limit-reset'))
    #Parse the UTC time
    reset = datetime.fromtimestamp(reset)
    #Let the user know we have reached the rate limit
    print "0 of {} requests remaining until {}.".format(limit, reset)

    if wait:
        #Determine the delay and sleep
        delay = (reset - datetime.now()).total_seconds() + buffer
        print "Sleeping for {}s...".format(delay)
        sleep(delay)
        #We have waited for the rate limit reset. OK to proceed.
        return True
    else:
        #We have reached the rate limit. The user needs to handle the rate limit manually.
        return False 

    #We have not reached the rate limit
    return True

myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener , 
tweet_mode='extended')


myStream.filter(track=['#bitcoin'],async=True)
Run Code Online (Sandbox Code Playgroud)

有没有人有办法解决吗 ?

And*_*per 7

tweet_mode=extended将不会对此代码产生任何影响,因为Streaming API不支持该参数.如果Tweet包含更长的文本,它将在JSON响应中包含一个额外的对象,该对象将包含一个名为extended_tweet的字段full_text.

在这种情况下,您需要类似于print(status.extended_tweet.full_text)提取较长文本的内容.


小智 7

Twitter 流中有可用的布尔值。当消息包含超过 140 个字符时,'status.truncated' 为 True。只有这样,“extended_tweet”对象才可用:

        if not status.truncated:
            text = status.text
        else:
            text = status.extended_tweet['full_text']
Run Code Online (Sandbox Code Playgroud)

这仅在您流式传输推文时有效。当您使用 API 方法收集旧推文时,您可以使用以下内容:

tweets = api.user_timeline(screen_name='whoever', count=5, tweet_mode='extended')
for tweet in tweets:
    print(tweet.full_text)
Run Code Online (Sandbox Code Playgroud)

此 full_text 字段包含所有推文的文本,无论是否被截断。