如何在Python中使用Twitter API提取包含特定关键字的推文位置

Question

如何在Python中使用Twitter API提取包含特定关键字的推文位置

Rah*_*jan 5 python tweepy python-3.x pandas

我正在尝试提取包含特定关键字及其地理位置的所有推文。

例如，我要下载所有英文的推文，其中包含来自“ france”和“ singapore”的关键字“ iphone ”

我的密码

import tweepy import csv import pandas as pd import sys # API credentials here consumer_key = 'INSERT CONSUMER KEY HERE' consumer_secret = 'INSERT CONSUMER SECRET HERE' access_token = 'INSERT ACCESS TOKEN HERE' access_token_secret = 'INSERT ACCESS TOKEN SECRET HERE' auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) api = tweepy.API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify=True) # Search word/hashtag value HashValue = "" # search start date value. the search will start from this date to the current date. StartDate = "" # getting the search word/hashtag and date range from user HashValue = input("Enter the hashtag you want the tweets to be downloaded for: ") StartDate = input("Enter the start date in this format yyyy-mm-dd: ") # Open/Create a file to append data csvFile = open(HashValue+'.csv', 'a') #Use csv Writer csvWriter = csv.writer(csvFile) for tweet in tweepy.Cursor(api.search,q=HashValue,count=20,lang="en",since=StartDate, tweet_mode='extended').items(): print (tweet.created_at, tweet.full_text) csvWriter.writerow([tweet.created_at, tweet.full_text.encode('utf-8')]) print ("Scraping finished and saved to "+HashValue+".csv") #sys.exit()
Run Code Online (Sandbox Code Playgroud)
如何才能做到这一点。

Answer 1

Res*_*her 1

-你好-拉胡尔

据我了解，您希望从搜索的推文中获取地理数据，而不是根据地理编码过滤搜索。

这是一个代码示例，其中包含您感兴趣的相关字段。这些可能会也可能不会提供，具体取决于推特用户的隐私设置。

请注意，搜索 API 上没有“since”参数：

https://tweepy.readthedocs.io/en/latest/api.html#help-methods

https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets

标准 Twitter API 搜索可追溯到 7 天。高级和企业 API 具有 30 天搜索以及完整存档搜索，但您将支付 $$$。

不幸的是 tweepy 还没有记录他们的模型：

https://github.com/tweepy/tweepy/issues/720

因此，如果您想查看 tweet 对象，您可以使用 pprint 包并运行：

pprint(tweet.__dict__)

Run Code Online (Sandbox Code Playgroud)

我注意到的一个区别是 JSON 中的“文本”字段在对象中变成了“full_text”。

如果您发现的是引用推文，那么那里还有有关原始推文的信息，与我所看到的信息相同。

无论如何，这是代码，我在测试时添加了一个最大推文计数，用于循环遍历光标以避免超出任何 API 限制。

如果您想要 csv 代码，请告诉我，但看起来您已经可以处理了。

import tweepy

# API credentials here
consumer_key = 'your-info'
consumer_secret = 'your-info'
access_token = 'your-info'
access_token_secret = 'your-info'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify=True)

searchString = "iPhone"

cursor = tweepy.Cursor(api.search, q=searchString, count=20, lang="en", tweet_mode='extended')

maxCount = 1
count = 0
for tweet in cursor.items():    
    print()
    print("Tweet Information")
    print("================================")
    print("Text: ", tweet.full_text)
    print("Geo: ", tweet.geo)
    print("Coordinates: ", tweet.coordinates)
    print("Place: ", tweet.place)
    print()

    print("User Information")
    print("================================")
    print("Location: ", tweet.user.location)
    print("Geo Enabled? ", tweet.user.geo_enabled)

    count = count + 1
    if count == maxCount:
        break;

Run Code Online (Sandbox Code Playgroud)

会输出类似这样的内容：

Tweet Information
================================
Text:  NowPlaying : Hashfinger - Leaving
https://derp.com

#iPhone free app https://derp.com
#peripouwebradio
Geo:  None
Coordinates:  None
Place:  None

User Information
================================
Location:  Greece
Geo Enabled?  True

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，5 月前
查看次数：	769 次
最近记录：	6 年，5 月前