Digital Humanities in Theatre and Performance Studies

In November, I attended ASTR (American Society for Theatre Research), an incredibly inspiring conference in the field of theatre and performance studies. The theme of the conference this year was “Debating the Stakes in Theatre and Performance Scholarship,” and panels ranged from debating food in performance (the steaks in theatre and performance…) to the politics of performance.

This past year or so I’ve been accepting and coming out in all my nerdiness—and hesitantly been entering into conversations in the field we call “digital humanities.” Whether it’s a field, a set of methods spanning different disciplines, or even its own discipline is still quite up in the air—see Svensson 2010 for an interesting conversation on that, which includes HASTAC!

At the conference, there were some really interesting strides to make the discipline of Theatre and Performance Studies, especially in a North American context, move in the direction of digital methods, projects, and theories. Specifically, I’d like to share some of my experiences in this post about my participation in one working group, my attendance in another working group, a panel presentation, and a meeting which laid some groundwork for a summer institute.

My conference attendance began with the first working group called “The Stakes of Digital Scholarship of Theatre and Performance.” It was organized around lightning talks by all the 20-something participants on projects spanning many different questions and examples — from Twitter theatre to the Transborder Immigrant Tool. The latter was presented by Ashley Ferro-Murray, a former HASTAC Scholar who, among many other fantastic projects, has created a choreographic interpretation of the project.

The following day, back to back, were my own workshop—on digital methods in theatre and performance research—and a panel on big data in theatre history.

My own working group, “Digital Methods: Collaboration, Evaluation, and Access in Digital Theatre Scholarship,” focused specifically on questions around collaboration and standards for evaluation of digital projects in research on theatre and performance. We began our 2-hour session with an hour-long demo session of our projects. I presented the beginnings of my dissertation project on boylesque and male-identified striptease dancers. Following our demos, we had a productive conversation about the importance of standards when it comes to developing a digital humanities-inflected research agenda in our field.

Immediately after our working group session, I went to the panel presentation on “Theatre History and the Stakes of Big Data.” (I storified it here.) There were certainly threads in the panel where one could make the argument that “DH + theatre/performance = quantitative research.” Derek Miller’s very impressive Visualizing Broadway project is case in point, illustrating the value of visualizing large bodies of statistics over years.

But both Debra Caplan’s use of big data to study and visualize the Yiddish Vilna Theatre Troupe using d3js, and Jeffrey Ravel’s distant reading of receipts, attendance, and plays from the Comedie Francaise in the Comedie Francaise Registers Project illustrate how we can use these methods to tell a more intricate story with a combination of qualitative and quantitative data.

Finally, I participated in a meeting to lay some groundwork for a summer institute that would include the two major organizations for theatre researchers in the U.S. coming together in a grant application. The meeting was led by the incredible David Saltz — I was star struck and had the awesome opportunity to steal some moments with him after the meeting to talk to him about our mutual love for HyperCard. We’re continuing our work and are submitting our grant application in the next couple of months.

What’s next? We will further develop the ideas generated at ASTR at the HASTAC 2016 Conference roundtable conversation on “An Archive and Repertoire of Digital Humanities and Media Projects in the Performing Arts.” I also hope this will lead to further opportunities. I will also be at DHSI and would love to see conversations around performance studies and DH happening there. Will any of you be there?

Methodology for Mapping Project

In this post, I will describe the process of bringing tweets from over the past year into a single heat map with interactive (clickable) tweets displaying more information. Most importantly perhaps, in what follows I will also readily discuss the flaws and problems of the methodology, in order to hopefully gain some insights into how to change the methodology to make this part of my project more interesting.

1. TAGS

TAGS is described by its creator Martin Hawksey as “a free Google Sheet template which lets you setup and run automated collection of search results from Twitter.” It really is a script written in JavaScript for collecting tweets. The script is requesting the latest tweets, every hour, from the Twitter REST API. I used it to collect the tweets over a year mentioning the search terms:

  • boylesque
  • male strip1
  • male striptease1
  • male stripclub1
  • Male Erotic Dancing1
  • male revue1
  • #malestriptease1

All of those tweets were collected in (two separate) Google Docs spreadsheets. As of October 28, I have collected 22,683 tweets (9,511 + 13,172).

Note: some bugs in the script creates some double entries of the same tweets into the spreadsheet. They’re easy to identify as every tweet has an ID assigned from Twitter, which makes it easy to sort them out. More about that below.

Note: Some tweets overlap with Instagram posts but I haven’t implemented any other APIs at this point, on top of the Twitter API, which means that there will be some overlap in the future, should I choose to implement the Instagram API, for instance.

2. Export the tweets, import and mapping onto a mysql database

The tweet spreadsheet was exported as a .csv file into a MySQL database, set up by myself on servers hosted by Dreamhost. I chose Dreamhost as they allow for remote access, so I could use a desktop client to administer the databases.

Note: Step 1 and 2 could be combined into one, if I wrote my own cron script that would run every hour requesting the same information as TAGS. I chose to use Hawksey’s TAGS because of the expedience in setting it up. Rather than having to program the code myself, the script is already set up and ready to go in a Google Doc.

Note: I could have used phpMyAdmin to administer the database but prefer to use a desktop client as it’s faster to handle.

3. Clean up some of the data

I was in need to removing some duplicate data that existed, because of some programming error in Hawksey’s script, in my own data. I set up the MySQL database with a UNIQUE key for the tweet ID column, which means that the import feature of MySQL will automatically disregard any duplicate values for any entries. (Note: it doesn’t actually disregard but produce an error; in this case, however, the error is in fact desired.)

In this step, I also made sure to do some manual cleanup of the data that was necessary by going over the ~8,000 unique tweets I ended up with after the procedure with the UNIQUE key to see if any tweets were unintentionally cut off, or containing characters that were encoded in a different format, for instance. (Note: I made sure that both the CSV file and the database were UTF-16 encoded to be able to include a few tweets written in Japanese, Arabic, Hebrew, etc.)

Note: Hawksey’s script contains an option to wipe duplicate entries but that didn’t work so I needed a solution for this.

4. Pull user data from the database of tweets

One of the columns in my data contained the user ID for the tweeter behind the tweet. I wrote a PHP script to pull the unique user IDs, and then their respective affiliated tweet IDs from the MySQL database, and add these to a separate table to reduce the amount of queries necessary in step 5 below.

5. Request the user info from the Twitter API

For each of the unique user IDs, I submit 100 requests at a time to the Twitter API, since I am interested in pulling the location for the user who has tweeted

Note: this is a step I actually have had to do manually as of now because I haven’t written a cron script to perform this part of the process, since there needs to be some time in between the requests to the Twitter REST API.

Note: This step is where the ethical and the methodological aspects of my method become somewhat questionable, which I’d like to address here:

  • Ethically because there is no consent from the users to become part of my database or to share this information with me specifically. Yet, one might argue that they have agreed to the terms of use of Twitter which stipulates that this information is public. A counterargument could be that my database freezes in time user information that otherwise can change from being public to being private at any point in time. I could attempt to anonymize the data to the point where I create one map for the location of the tweeters, which is not interactive — that is, that does not show the location, username, and text of the tweet itself.
  • Methodologically, here’s a related problem as well: There’s not really any way to anonymize data that’s publicly available. You can easily search for the text quoted, for instance. (This becomes a problem in studies such as Kim Price-Glynn’s where she doesn’t want to write the real name of the strip bar where she does her studies; yet, in her last chapter she analyzes and quotes online conversations about the club, which makes it easy for someone to search the web, or archive.org, for the literal quotes, and find the name of the bar that way.)
  • Methodologically as well, it is problematic because it doesn’t indeed show the location of the tweet, per se, but rather the (self-professed) location of the user. That means that there will be many users layered on top of each other in a location such as New York, and few in New Orleans, despite the fact that many New Yorkers are indeed tweeting from New Orleans while traveling. (That is, the method doesn’t capture tweets from other locations than the user’s professed “home base.”) Moreover, some users have multiple locations on their accounts: New York, San Francisco, and New Orleans, for example. The method has no way, currently, of mapping such accounts. On the prototype, I have used the first city mentioned, while acknowledging that this is a flaw.

6. Pull all unique locations from user information

From all of the users locations, I have written a PHP script to, once again, pull all the locations into a list of unique locations, mapped onto a location ID (which will prove necessary in the step creating relationships between the database tables).

6. Manual clean-up of the locations

In an attempt at creating a consolidated list of locations, I needed to manually clean up misspellings (since this is information a user types in, New York can be spelled in a number of ways) or abbreviations of city names (change NYC and at times NY to New York).

Note: I haven’t figured out a way to do this automatically yet. Theoretically, I could write a script that would pull from a list of the occurrences of corrections that I’d like to do and perform them automatically but the list would still have to be managed manually.

7. Geolocate each of the locations

Batch geocoding of all the locations is done through the free and open Batch Geocoding website.

Note: There should be other ways of performing this step but I haven’t figured that out yet. I might use Google’s geocoding API. The problem with their API (and many others) is that it’s limited to 2,500 requests per day. I may need to apply for research funds for this part of the project.

8. Assign weights to locations

The locations are not very precise, which means that lots of tweets will be layered on top of each other. One way to deal with this for now is to assign weights to each location, in order to create a heatmap for the respective tweets.

Problem here: that means that the interactive aspect of the map is lost. I don’t believe CartoDB has a way to solve this yet which means that to maintain the interactive aspect of the map, I might need to migrate from the CartoDB platform.

9. Create relationships between all the tables in the database

10. Export the data to CartoDB

11. Creating the maps in CartoDB

A note on collaboration

As a goal in the project, I want to offer my package back to the community of DH scholars, by making all of it available via GitHub. This would probably entail the need for more research funding, unfortunately, as I will need to make my scripts a little less specific to my own research. The Graduate Center, CUNY offers Digital Fellowships which may cover this part of the project in the near future.

Notes

1 Since July 17 — before then, the project was going to focus explicitly and only on boylesque as a genre.

Later This Month: Participation in Media Res #2

Later this month—on November 17th, to be precise—I’ve been asked to present a lightning talk on The Roots and Routes of Boylesque, likely very inspired by my presentation at ASTR later this week. I’m so excited!

The event Media Res #2 will be a follow-up on the inaugural Media Res event earlier this Spring. It takes place at NYU’s Bobst Library on Tuesday, November 17th at 5pm, and showcases a range of graduate student work in the Digital Humanities in an attempt at fostering community across NYC universities.

A Digital (N)Ethnographic Journey through the Roots and Routes of Boylesque

I just submitted my first project proposal ever to ASTR—American Society for Theatre Researchers for this year’s conference in Seattle, WA Portland, OR (edit: June 24). The conference is organized in working groups around certain topics, all with processes and structure for presentations that are different. This year, I want to participate and thus submitted a proposal to one of two working groups on digital humanities: Collaboration, Evaluation, and Access in Digital Theatre Scholarship. The working session will be structured with an hour-long discussion of 10-12 page long papers circulated before the conference, addressing questions of collaboration, evaluation, and access. After that, another hour will follow of digital poster presentations—”a hands-on interactive session during which participants demonstrate (via their own laptops) a particular research methodology that makes use of digital tools, and engage in discussion with attendees,” as the website states.

My proposal, in its entirety, can be read here:

Self-identified male bodies in burlesque has a history that goes further back than is normally considered in accounts of burlesque history and are an under-theorized absence in the many accounts of burlesque. My dissertation addresses the history and political aspects of boylesque—a fairly new genre growing out of the neo-burlesque movement. I contextualize the genre and the employment of the term boylesque in the larger history of male striptease in New York, the US, and globally.

My dissertation is a born-digital project where (n)ethnographic research into social media conversations around the genre governs the formulation of my initial research questions. I am mapping these conversations by scraping Twitter and geotagging the conversations over time in CartoDB. The dissertation is being constructed in the open-source platform Scalar which can integrate the maps and other media into the text.

Methodologically, I want to develop new ways, through technology, to engage with the historical and contemporary subjects of my dissertation in ways that can involve them in the longer process of both participating in as well as creating a research project, and make the project accessible to audiences outside the traditional boundaries of academia. In all, the dissertation project is developed in and with the public, and thus situated in the intersection of Public and Digital Humanities.

For convenience, I have posted the entire Call for Papers request here:

With the proliferation of digital projects in theatre and performance studies, new questions arise about technologically infused research methodologies and the availability of digital tools. How can we properly recognize work by technical consultants and designers? How should institutions evaluate not only the findings of digital projects but also code or other artifacts of digital research? How might digital research restrict access by scholars and students without proper resources (financial, technical, human)? This working session aims both to provide a platform for sharing current digital scholarship and to permit reflection on the political implications of digital research.

The three issues of collaboration, evaluation, and access, while certainly not new to scholars, are particularly poignant in digital work. The problems arise no matter what the content of the material; thus, this session invites the participation of scholars working in any aspect of theatre and performances studies, representing a diverse array of topics and time periods. Digital scholarship challenges a traditional humanist ethos of solitary thought. We will consider how digital scholarship demands that we recognize the wider polity involved in all forms of critical work. Evaluating digital projects raises similar challenges. The multiplicity of skills required for digital scholarship and the necessary division of labor may require assessment more like that for creative work than for a traditional monograph. In both cases, theatre and performance studies’ long commitment to practice-based research may provide a useful model for thinking about collaboration in and the evaluation of digital scholarship. Finally, digital humanities’ utopian vision of open access (free online texts, digital archives) disguises other problems of accessibility: what kind of financial and infrastructural resources are required to support digital work? And how can we ensure that the digital humanities do not reproduce the normatively white, male structure that dominates the tech world in Silicon Valley? Digital humanities invites us to consider these political questions anew.