The latest in our social media harvesting experiments for the Social Feed Manager project involves analysis, discovery, and visualization of social media content. An analytics service may help satisfy two needs:
- 1. For the collection creator, being able to evaluate the content that is being collected so as to adjust the collection criteria. For example, for Twitter a collection creator may discover additional hashtags to collect. Since a collection creator may be collecting a rapidly evolving event, this requires near real-time analysis.
- 2. For the researcher, being able to analyze the content. Though many researchers will need to export the social media content for use with other tools, having available some sort of an analytics service may meet the needs of some researchers and may lower the barrier to performing social media research.
We also wanted to test the extensibility of the SFM architecture to make sure that additional services can be readily added.
The ELK (Elasticsearch, Logstash, Kibana) stack was selected for this experiment. It was selected primarily on the intuition that it was a good fit, rather than an analysis of its features or a comparison against other options. For those not familiar with this stack, Kibana is the discovery and visualization interface, Elasticsearch is the data store, and Logstash loads Elasticsearch with data. We’ll refer to our own implementation as SFM-ELK.
In SFM infrastructure, harvesters, such as the Twitter harvester, invoke the APIs of social media platforms and record the results in WARC files. Harvesters publish warc_created messages to a message queue whenever a WARC file is created. This provides the critical hook for SFM-ELK to perform loading -- a message consumer application listens for warc_created messages. When it receives a warc_created message, it:
- 1. Invokes the appropriate WARC iterator (e.g., TwitterRestWarcIter) to read the WARC file and output the social media records as line-oriented JSON.
- 2. Pipes this to jq, which filters the JSON. Most types of social media records contain extraneous metadata which do not need to be indexed in Elasticsearch. Logstash supports various mechanisms for filtering and transforming loaded data, but jq proved better for JSON data.
- 3. Pipes this into Logstash, which loads it into Elasticsearch.
Once properly loaded into Elasticsearch, the data is available for discovery and visualization using Kibana. Note that additional data is loaded as new WARC files are created.
For the purposes of this experiment, data harvested from Twitter’s search API using the search terms "gwu" and "gelman" was used.
While understanding the full power and flexibility of Kibana involves a significant learning curve, some of the functionality is readily usable. For example, to discover the tweets mentioning GWU’s President Knapp, enter “knapp” in the search box on the Discover screen:
or to find tweets posted by @gelmanlibrary:
Kibana allows you to easily adjust the timeframe of any discovery or visualization:
To demonstrate the sort of visualizations that might be useful for a collection creator or researcher, we created a Twitter dashboard:
Here’s each of those visualizations in a more readable size:
Note that the dashboard is periodically refreshed as new data is added.
As should be evident, this experiment barely scratches the surface of the capabilities of the ELK stack, or more generally, the potential of adding an analytics service to Social Feed Manager. The code for SFM-ELK is available at https://github.com/gwu-libraries/sfm-elk. Instructions are provided to bring up a Docker environment so that you can give it a try yourself. Keep in mind that this is only a proof-of-concept and it is not currently in scope of SFM development.
If any of this is of interest to you or your organization, collaborators are welcome.
P.S. It was just announced that Washington University in St. Louis, the Maryland Institute for Technology in the Humanities (MITH) at the University of Maryland, and the University of California, Riverside were awarded a Mellon grant for a project titled "Documenting the Now: Supporting Scholarly Use and Preservation of Social Media Content." Since there’s a clear need to support researchers' and archivists' needs for good analytical tools, we look forward to their work. Follow the project at @documentnow.
Prepare yourself for academic and professional success by learning the communication skills you need. The GW Libraries offer a wide range of free workshops, which are open to all GW students, staff, and alumni.
Spring topics include:
- Geographic Information System (GIS) Data Basics
- 3-D Modeling with Tinkercad
- Principles of Graphic Design
- Developing Your Professional Self
- Building a WordPress Portfolio
- Developing Engaging Presentations
- and More!
Check our website for a complete list of upcoming workshops and events.
The GW Libraries are thrilled to host a display of photographs and poems from the Blue Wings Project on bulletin boards throughout Gelman. Blue Wings Project brings together writers and artists of all disciplines to explore and make cross-national connections. The project is lead by the Corcoran School New Media Photojournalism (NMPJ) Master of Arts program in collaboration with the Afghan Women's Writing Project (AWWP). Originally launched as a classroom-based project in Spring 2015, Blue Wings has expanded to include the entire university community. BFA Photojournalism and New Media Photojournalism graduate students were invited to read and respond to the writings of AWWP authors, all of whom are women residing in Afghanistan. The result is an exciting launch of virtual conversations between the writers in Afghanistan and photographers at the Corcoran. #bluewings
Afghan Women's Writing Project
The Afghan Women's Writing Project (AWWP) was founded in 2009 to support the human rights of an individual to tell her story. AWWP provides a platform for Afghan women to develop their voices and discover their power in the world without the filter of the media or other influences. AWWP works with women in Afghanistan and helps them to write in English and Dari. Students sent their writings to the wokrshop which later get published in an online magazine. AWWP has also published two collections of poetry and prose, available online: The Sky is a Nest of Swallows (2015) and Washing the Dust from Our Hearts (2014).
New Media Photojournalism
The New Media Photojournalism program at the Corcoran School of the Arts and Design is the first of its kind, created to help visual journalists study and excel within the changing world of photojournalism.
Are you a graduate student working on a literature review for a thesis or dissertation? Get serious about your scholarship by attending these 30-minute workshops to learn tips that will save you time and sanity. Our "boot camps" on Martin Luther King's Birthday and President's Day offer several popular workshops together - attend one or all.
All sessions will take place in Gelman Library, Room 301-302. Please bring your own computer. Kids off school? Quiet and happily occupied offspring are welcome.
Monday, January 18 (MLK's Birthday) & Monday, February 15 (President's Day):
9:00-9:30: The Basics: Mapping your Research
9:30-10:00: Searching Beyond Gelman
10:00-10:30: Citation Management
10:45-11:15: Citation Chasing
11:15-11:45: Staying Current in One's Field
The Basics: Mapping your Research
What is a Literature Review, and what information do I need to begin one? Learn tips on how to begin your search, discover keywords, and narrow topics. Save time and frsutration by discovering how to find the right databases and resources for your topic using GW Libraries’ tools.
Searching Beyond Gelman
How do you know what research is out there? How can you know what you don't know? Be sure with a comprehensive search of all published book literature using Worldcat. This workshop is best for disciplines that write books, especially the humanities and social sciences.
Once you've done all that research how do you keep track of it? Step away from the notecards and learn about online citation tools like Refworks, Zotero and Mendeley. Librarians will help you find the tool that is right for you and get you started using it.
How do you build on someone else's research? How do you find the research they used? Learn to chase down those citations like a pro in this short workshop.
Staying Current in One's Field
A successful graduate student participates in the research conversation of her/his field. If you need help getting started, this workshop will help you find out how to stay current. You'll learn how to set up journal table of contents alerts, search alerts, and identify key journals in your field.
If you can't make it to all of the sessions or need more information be sure to check out the research guide "What Graduate Students Need to Know."
Explore the journey of Picasso, Diaghilev, Kertesz, Stravinsky, & others who forged artistic collaborations and established Paris as the center of Modernist thought in the early 20th century.
Visiting museums, touring iconic architectural sites and viewing contemporary performance spaces, we will measure today's art against the past.
June 1-14, 2016
Paris: Modernism and the Arts, Then and Now —TRDA 4595w
No language requirements
3 credits, WID, Elliot School and Cultural Studies Course Humanities GCR