Digital Humanities in Theatre and Performance Studies

In November, I attended ASTR (American Society for Theatre Research), an incredibly inspiring conference in the field of theatre and performance studies. The theme of the conference this year was “Debating the Stakes in Theatre and Performance Scholarship,” and panels ranged from debating food in performance (the steaks in theatre and performance…) to the politics of performance.

This past year or so I’ve been accepting and coming out in all my nerdiness—and hesitantly been entering into conversations in the field we call “digital humanities.” Whether it’s a field, a set of methods spanning different disciplines, or even its own discipline is still quite up in the air—see Svensson 2010 for an interesting conversation on that, which includes HASTAC!

At the conference, there were some really interesting strides to make the discipline of Theatre and Performance Studies, especially in a North American context, move in the direction of digital methods, projects, and theories. Specifically, I’d like to share some of my experiences in this post about my participation in one working group, my attendance in another working group, a panel presentation, and a meeting which laid some groundwork for a summer institute.

My conference attendance began with the first working group called “The Stakes of Digital Scholarship of Theatre and Performance.” It was organized around lightning talks by all the 20-something participants on projects spanning many different questions and examples — from Twitter theatre to the Transborder Immigrant Tool. The latter was presented by Ashley Ferro-Murray, a former HASTAC Scholar who, among many other fantastic projects, has created a choreographic interpretation of the project.

The following day, back to back, were my own workshop—on digital methods in theatre and performance research—and a panel on big data in theatre history.

My own working group, “Digital Methods: Collaboration, Evaluation, and Access in Digital Theatre Scholarship,” focused specifically on questions around collaboration and standards for evaluation of digital projects in research on theatre and performance. We began our 2-hour session with an hour-long demo session of our projects. I presented the beginnings of my dissertation project on boylesque and male-identified striptease dancers. Following our demos, we had a productive conversation about the importance of standards when it comes to developing a digital humanities-inflected research agenda in our field.

Immediately after our working group session, I went to the panel presentation on “Theatre History and the Stakes of Big Data.” (I storified it here.) There were certainly threads in the panel where one could make the argument that “DH + theatre/performance = quantitative research.” Derek Miller’s very impressive Visualizing Broadway project is case in point, illustrating the value of visualizing large bodies of statistics over years.

But both Debra Caplan’s use of big data to study and visualize the Yiddish Vilna Theatre Troupe using d3js, and Jeffrey Ravel’s distant reading of receipts, attendance, and plays from the Comedie Francaise in the Comedie Francaise Registers Project illustrate how we can use these methods to tell a more intricate story with a combination of qualitative and quantitative data.

Finally, I participated in a meeting to lay some groundwork for a summer institute that would include the two major organizations for theatre researchers in the U.S. coming together in a grant application. The meeting was led by the incredible David Saltz — I was star struck and had the awesome opportunity to steal some moments with him after the meeting to talk to him about our mutual love for HyperCard. We’re continuing our work and are submitting our grant application in the next couple of months.

What’s next? We will further develop the ideas generated at ASTR at the HASTAC 2016 Conference roundtable conversation on “An Archive and Repertoire of Digital Humanities and Media Projects in the Performing Arts.” I also hope this will lead to further opportunities. I will also be at DHSI and would love to see conversations around performance studies and DH happening there. Will any of you be there?

Methodology for Mapping Project

In this post, I will describe the process of bringing tweets from over the past year into a single heat map with interactive (clickable) tweets displaying more information. Most importantly perhaps, in what follows I will also readily discuss the flaws and problems of the methodology, in order to hopefully gain some insights into how to change the methodology to make this part of my project more interesting.

1. TAGS

TAGS is described by its creator Martin Hawksey as “a free Google Sheet template which lets you setup and run automated collection of search results from Twitter.” It really is a script written in JavaScript for collecting tweets. The script is requesting the latest tweets, every hour, from the Twitter REST API. I used it to collect the tweets over a year mentioning the search terms:

  • boylesque
  • male strip1
  • male striptease1
  • male stripclub1
  • Male Erotic Dancing1
  • male revue1
  • #malestriptease1

All of those tweets were collected in (two separate) Google Docs spreadsheets. As of October 28, I have collected 22,683 tweets (9,511 + 13,172).

Note: some bugs in the script creates some double entries of the same tweets into the spreadsheet. They’re easy to identify as every tweet has an ID assigned from Twitter, which makes it easy to sort them out. More about that below.

Note: Some tweets overlap with Instagram posts but I haven’t implemented any other APIs at this point, on top of the Twitter API, which means that there will be some overlap in the future, should I choose to implement the Instagram API, for instance.

2. Export the tweets, import and mapping onto a mysql database

The tweet spreadsheet was exported as a .csv file into a MySQL database, set up by myself on servers hosted by Dreamhost. I chose Dreamhost as they allow for remote access, so I could use a desktop client to administer the databases.

Note: Step 1 and 2 could be combined into one, if I wrote my own cron script that would run every hour requesting the same information as TAGS. I chose to use Hawksey’s TAGS because of the expedience in setting it up. Rather than having to program the code myself, the script is already set up and ready to go in a Google Doc.

Note: I could have used phpMyAdmin to administer the database but prefer to use a desktop client as it’s faster to handle.

3. Clean up some of the data

I was in need to removing some duplicate data that existed, because of some programming error in Hawksey’s script, in my own data. I set up the MySQL database with a UNIQUE key for the tweet ID column, which means that the import feature of MySQL will automatically disregard any duplicate values for any entries. (Note: it doesn’t actually disregard but produce an error; in this case, however, the error is in fact desired.)

In this step, I also made sure to do some manual cleanup of the data that was necessary by going over the ~8,000 unique tweets I ended up with after the procedure with the UNIQUE key to see if any tweets were unintentionally cut off, or containing characters that were encoded in a different format, for instance. (Note: I made sure that both the CSV file and the database were UTF-16 encoded to be able to include a few tweets written in Japanese, Arabic, Hebrew, etc.)

Note: Hawksey’s script contains an option to wipe duplicate entries but that didn’t work so I needed a solution for this.

4. Pull user data from the database of tweets

One of the columns in my data contained the user ID for the tweeter behind the tweet. I wrote a PHP script to pull the unique user IDs, and then their respective affiliated tweet IDs from the MySQL database, and add these to a separate table to reduce the amount of queries necessary in step 5 below.

5. Request the user info from the Twitter API

For each of the unique user IDs, I submit 100 requests at a time to the Twitter API, since I am interested in pulling the location for the user who has tweeted

Note: this is a step I actually have had to do manually as of now because I haven’t written a cron script to perform this part of the process, since there needs to be some time in between the requests to the Twitter REST API.

Note: This step is where the ethical and the methodological aspects of my method become somewhat questionable, which I’d like to address here:

  • Ethically because there is no consent from the users to become part of my database or to share this information with me specifically. Yet, one might argue that they have agreed to the terms of use of Twitter which stipulates that this information is public. A counterargument could be that my database freezes in time user information that otherwise can change from being public to being private at any point in time. I could attempt to anonymize the data to the point where I create one map for the location of the tweeters, which is not interactive — that is, that does not show the location, username, and text of the tweet itself.
  • Methodologically, here’s a related problem as well: There’s not really any way to anonymize data that’s publicly available. You can easily search for the text quoted, for instance. (This becomes a problem in studies such as Kim Price-Glynn’s where she doesn’t want to write the real name of the strip bar where she does her studies; yet, in her last chapter she analyzes and quotes online conversations about the club, which makes it easy for someone to search the web, or archive.org, for the literal quotes, and find the name of the bar that way.)
  • Methodologically as well, it is problematic because it doesn’t indeed show the location of the tweet, per se, but rather the (self-professed) location of the user. That means that there will be many users layered on top of each other in a location such as New York, and few in New Orleans, despite the fact that many New Yorkers are indeed tweeting from New Orleans while traveling. (That is, the method doesn’t capture tweets from other locations than the user’s professed “home base.”) Moreover, some users have multiple locations on their accounts: New York, San Francisco, and New Orleans, for example. The method has no way, currently, of mapping such accounts. On the prototype, I have used the first city mentioned, while acknowledging that this is a flaw.

6. Pull all unique locations from user information

From all of the users locations, I have written a PHP script to, once again, pull all the locations into a list of unique locations, mapped onto a location ID (which will prove necessary in the step creating relationships between the database tables).

6. Manual clean-up of the locations

In an attempt at creating a consolidated list of locations, I needed to manually clean up misspellings (since this is information a user types in, New York can be spelled in a number of ways) or abbreviations of city names (change NYC and at times NY to New York).

Note: I haven’t figured out a way to do this automatically yet. Theoretically, I could write a script that would pull from a list of the occurrences of corrections that I’d like to do and perform them automatically but the list would still have to be managed manually.

7. Geolocate each of the locations

Batch geocoding of all the locations is done through the free and open Batch Geocoding website.

Note: There should be other ways of performing this step but I haven’t figured that out yet. I might use Google’s geocoding API. The problem with their API (and many others) is that it’s limited to 2,500 requests per day. I may need to apply for research funds for this part of the project.

8. Assign weights to locations

The locations are not very precise, which means that lots of tweets will be layered on top of each other. One way to deal with this for now is to assign weights to each location, in order to create a heatmap for the respective tweets.

Problem here: that means that the interactive aspect of the map is lost. I don’t believe CartoDB has a way to solve this yet which means that to maintain the interactive aspect of the map, I might need to migrate from the CartoDB platform.

9. Create relationships between all the tables in the database

10. Export the data to CartoDB

11. Creating the maps in CartoDB

A note on collaboration

As a goal in the project, I want to offer my package back to the community of DH scholars, by making all of it available via GitHub. This would probably entail the need for more research funding, unfortunately, as I will need to make my scripts a little less specific to my own research. The Graduate Center, CUNY offers Digital Fellowships which may cover this part of the project in the near future.

Notes

1 Since July 17 — before then, the project was going to focus explicitly and only on boylesque as a genre.

Trouble with Geotagging Tweets with Users

In my initial mapping project Boylesque in the Twitter World, I used TAGS to scrape tweets mentioning boylesque, then matched those tweets with their respective user and the user’s location (something TAGS can’t manage yet) via Twitter’s REST API. The method poses a number of problems, which I will describe in this blog post. Please note that the post is not complete and that I invite comments from you below if you think of other problems.

Detailed Method Description

Tweet —> scraping (TAGS) —> CSV file —> database (local script) —> if no tweet geotag: query REST API (100 requests/time) for user location (local script) —> insert into database (local script) —> export tweets matched with locations from database (local script) to CSV) —> import CSV into CartoDB —> map

Problem: Locations Do Not Necessarily Match Tweet Location

The method described above doesn’t account for tweets that a user posts while traveling, unless the tweet is already geotagged. (that’s a problem because only 1% of tweets are geotagged, according to “Only 30% of Messages on Twitter Are From the U.S.”1) They get tagged as if the tweet was posted in the user’s hometown. This means that the map currently really shows where users are located who tweet about boylesque. Help: Is there a way to get around this problem? If you think of anything, please comment below.