Trouble with Geotagging Tweets with Users

In my initial mapping project Boylesque in the Twitter World, I used TAGS to scrape tweets mentioning boylesque, then matched those tweets with their respective user and the user’s location (something TAGS can’t manage yet) via Twitter’s REST API. The method poses a number of problems, which I will describe in this blog post. Please note that the post is not complete and that I invite comments from you below if you think of other problems.

Detailed Method Description

Tweet —> scraping (TAGS) —> CSV file —> database (local script) —> if no tweet geotag: query REST API (100 requests/time) for user location (local script) —> insert into database (local script) —> export tweets matched with locations from database (local script) to CSV) —> import CSV into CartoDB —> map

Problem: Locations Do Not Necessarily Match Tweet Location

The method described above doesn’t account for tweets that a user posts while traveling, unless the tweet is already geotagged. (that’s a problem because only 1% of tweets are geotagged, according to “Only 30% of Messages on Twitter Are From the U.S.”1) They get tagged as if the tweet was posted in the user’s hometown. This means that the map currently really shows where users are located who tweet about boylesque. Help: Is there a way to get around this problem? If you think of anything, please comment below.

Notes and Links

  1. “Only 30% of messages on Twitter are from the U.S.: U.S., Japan, Brazil top 3 Twitter nations,” Press Release, Semiocast, published March 31, 2010, http://semiocast.com/downloads/Semiocast_Only_30_percent_of_messages_on_Twitter_are_from_the_U.S._20100331.pdf

Comments