The George Washington University
Thursday, March 24
9:30am to 11:30am
Global Resources Center (Gelman 708)
Please RSVP: go.gwu.edu/GRCCoffee
Please join us in the Global Resources Center (GRC) for an international student coffee hour co-hosted with the International Services Office (ISO). Take a tour of the GRC, chat with a specialist about your research and global interests, and enjoy a snack with your ISO friends!
The GRC focuses upon the political, socio-economic, historical, and cultural aspects of countries and regions around the globe from the 20th century onward with the following specialized resource centers: Russia, Eurasia, Central & Eastern Europe,China Documentation Center, Taiwan Resource Center, Japan Resource Center, Korea Resources, Middle East & North Africa.
Gelman Library: Open 7 a.m. - 9 p.m.
Eckles Library: Open 9 a.m. - 5 p.m.
Virginia Science & Technology Library: GW students, faculty, staff and alumni may access library study space 24/7 with a valid GWorld card.
New work also reveals the heightened importance of archived social media datasets that make it possible for researchers to re-use data. In order for this data to be useful, it must be curated and preserved with sufficient metadata to explain the conditions of its original capture and any subsequent actions taken to refine the data. For instance, a researcher may remove a particular hashtag or account as a study progresses, changing the resulting dataset. Archivists face a new mandate to develop tools and practices that support these conditions for re-use and reproducibility.
The Social Feed Manager team has heard this loud and clear! The need to keep track of changes to collection criteria (seeds, harvesting options, credentials, etc.) is reflected in our user stories for the new Social Feed Manager and initial support should be included in our next release (version 0.5.0). You can follow progress by watching the ticket. (Keep in mind that we are still pre-version 1.0, so SFM is in active development.)
We haven’t work on the UI yet, but this should give you an idea of how this feature works. First I created a new seed set. (This is an action that might be performed by a researcher or an archivist.) In SFM, a seed set is a list of seeds for a harvest, where a seed might be a Twitter handle or a Flickr user. Since the list is in reverse chronological order, the entry for creating the seed set is second. Second, I changed the schedule of the harvest. This is the first entry below.
Notice that whenever a change is made, the following is recorded:
- Each field and value that is changed. In this example, the schedule was changed.
- Who made this change. In this example, “justin” made the change.
- When the change was made.
- An optional note describing the reason for the change.
Again, the UI work is still to be done, but you can imagine an (understandable) version of these changes appearing when a user is reviewing a seed set.
Note that this change history is also tied into how we keep track of harvests -- SFM records the exact state of the collection criteria used to perform the harvest.
For those wondering, this is implemented with django-simple-history.
If you have thoughts on this feature, comments are welcome. In particular, we’re interested in ideas about how to make this information available and useful to researchers, especially in dataset exports. I can be reached @justin_littman or the whole team at sfm-dev.
On Feb 1, GW’s Expert Finder launched. Expert Finder is an implementation of VIVO, a researcher discovery platform. The project is a collaboration between the Division of Information Technology and GW Libraries. As one of the software developers on the project, I want to take this opportunity to discuss some noteworthy aspects of our implementation. In particular, we have made some choices that emphasized rapidly deploying VIVO using a minimal amount of resources.
In any VIVO implementation, data is aggregated from a number of existing sources, including campus information systems, researcher information systems, and human resource systems. At GW, the majority of our data comes from Banner, our campus information system, and Lyterati, a faculty management system.
With this sort of data, various quality issues are to be expected. At GW, our data is no different. Here are some examples:
- My position is listed as “Uv Librarian FT” (a full-time librarian, I think) in “UN LIB TECH/RESEARCH SRVCS” (Gelman Library).
- My alma mater, Amherst College, has entries for: Amherst College; Amherst College, Amherst, Massachusetts; Amherst College, Amherst, MA; and Amherst College, Amherst MA. You can imagine how many variations there are for GW!
- The full citation for a journal article is “The Contraceptive Mandate and Religious Liberty. Pew Forum on Religion in Publick Life. 2013.” Notice the misspelling and lack of co-authors, issue, volume, pages, and a DOI.
VIVO implementers employ a variety of strategies to mitigate data quality issues. These strategies include:
- Fixing the existing data source so that it produces cleaner data.
- Creating a new data source. For example, many VIVO implementers also implement Symplectic Elements to provide a reliable source for faculty publications.
- Have researchers correct their own data either in the source system or directly in VIVO.
- Data cleansing and normalization, both manually performed by a person and automated by software.
More than data cleansing and normalization, many VIVO implementers also perform data enhancement. Data enhancement involves collecting additional data to improve upon the existing data. So, for example, some institutions get a complete citation for journal articles by looking them up in Crossref or disambiguate article authorship using Harvard’s Disambiguation Engine.
For GW’s implementation, we have also employed some of these strategies to mitigate data quality issues. In particular, we have:
- Worked with our partners at Entigence to make changes to Lyterati that encourage cleaner data entry by faculty.
- Placed a heavy emphasis on researchers correcting their own data in the source system.
- Created an interface for non-faculty staff to enter some of their data.
Notably absent from this list is data cleansing, normalization, or enhancements. Other than some minor fixes like the format of phone numbers, the data is loaded exactly as received from the source systems. This is deliberate, as the project charter for Expert Finder specifically excludes “cleanup of data.” The strategy of not performing data cleansing, normalization, or enhancements trades off a significant savings in resources (both for initial implementation and on an ongoing basis) for some data quality.
We use the strategy of suppressing some data in the VIVO interface to partially compensate for not cleaning or normalizing the data,. In particular, we remove links and lists where the less clean data would be revealed and/or prevent VIVO from working properly. So, for example, publications are listed on a researcher’s detail page, but the publications are not linked to the publication’s detail page. Also, a user can get to the list of organizations, but the lists of people, research, and events has been removed. This approach does a reasonable job of presenting the data for a researcher, but obviously at the expense of reduced functionality for discovery.
Also worth drawing attention to is our approach for non-faculty staff. GW faculty publication, education, and funding is collected in Lyterati, but there is no existing system for non-faculty staff (like us librarians). Rather than create a new system for this, we are asking non-faculty staff to create and populate ORCID records. Using orcid2vivo, we then retrieve this information using ORCID’s API and load into VIVO. The data from ORCID tends to be very high quality. For an example of a researcher detail page with data from ORCID, see https://expert.gwu.edu/display/justinlittman.
These approaches have allowed GW to rollout our Expert Finder in a relatively short timeframe (about a year) with a minimal amount of resources (part of a project manager and business analyst and a fraction of 3 software developers and a sys admin). We look forward to feedback from the GW community, as well as the opportunity to assess Expert Finder based on actual usage by users. This will allow us to make adjustments as necessary so that Expert Finder can showcase the expertise at GW and enable collaboration.
Thursday, March 24
2:00pm to 3:30pm
Gelman Library, Room 219
Under the combined effects of diminished university revenues and predatory inflation by publishers, academic libraries everywhere are facing serious financial constraints. Join us for a presentation and discussion of the GW Libraries' collection strategy, and learn more about how the Libraries use data in managing collections across the disciplines.
The weather may not feel like spring, but the calendar says it is Spring Break! For those of you spending your break in the warm embrace of Gelman Library, please note our change in hours. The library will not be available for 24-hour study during Spring Break (March 12-19).
Friday, March 11
Close at 10pm
Saturday & Sunday, March 12 & 13
Monday-Friday, March 14-17
Saturday, March 19
Sunday, March 20
Open at 9am and 24-hour study resumes
Friday, February 26, 2016
11 a.m. to 1 p.m.
How can the GW Libraries help you succed academically? What could Gelman do to help you be a better researcher, student or teacher? Join the conversation with the GW Libraries Student Liaison, librarians, archivists, and special guest @GWPeterK. Use the hashtag #GelmanTownHall or, on the day of, use this portal to see the entire town hall thread!
Gelman Library will close on Saturday, February 20 at 10:30 p.m. and reopen on Sunday, February 21 at 9 a.m. No building access will be available during this time. This overnight closure is required to safely complete necessary construction activities for the National Churchill Library and Center on Gelman’s 1st floor. Work is being performed at night to minimize disruption to our library users.
Help the GW Libraries discover and decode history by transcribing appointments from Sir Winston Churchill’s World War II engagement diary. This crowdsourcing project will make Churchill’s wartime activities widely available for the first time to students and scholars around the world. Participants will gain new insight into the day-to-day process of national leadership, learn about Churchill and WWII, and provide a valuable service to historians around the world. Follow #ChurchillsDay on twitter where we'll share some interesting results of this project as they become available.
This collection of handwritten cards details Winston Churchill’s appointments during World War II, including such historic events as Victory in Europe (VE) Day and the British prime minister’s regular meetings with the King of England and President Franklin Roosevelt. The collection of 30 cards will be featured in the new National Churchill Library and Center on Gelman's 1st floor.
Learn more about the project and participate at crowdcrafting.org/project/churchill/.
The National Churchill Library and Center, slated to open in 2016, is part of a philanthropic partnership with the George Washington University and the Chicago-based Churchill Centre. Housed on the first floor of Gelman Library, this will be the first major research facility in the nation’s capital dedicated to the study of Winston Churchill.
Due to icy conditions, the Lit Review How To and Data Bootcamp workshops scheduled for President's Day have been CANCELED. We apologize for any inconvenience.
You can find much of the information from these sessions in the research guide, "What Graduate Students Need to Know."
Data Bootcamp sessions will be offered again on Thursday, March 10. You can also find much of the information to be covered in research guides: "Data Management," "Maps, Cartographic Data, and GIS Information," and "Uploading your ETD."
Are you a graduate student who needs help collecting, managing, and visualizing research data? Data Bootcamps bring together several, 30-minute workshops filled with practical solutions to save you hours of needless work. All sessions will be first-come, first serve, with the GIS session limited to 20 participants. Attend one session or all.
If you can't make it to all of the sessions or need more information be sure to check out the research guides: "Data Management," "Maps, Cartographic Data, and GIS Information," and "Uploading your ETD."
Kids off school? Quiet and happily occupied offspring are welcome.
What is Data?
Research data is data that is collected, observed, or created, for purposes of analysis to produce original research results, but what does that really mean for your own work? Data librarian Mandy Gooch will define research data & data-related terms and discuss common data formats. You'll explore use agreements and restrictions, and identify library and campus services and resources related to data.
Data management refers to activities that support the long-term preservation, access, and use of data. In this short workshop Data Librarian Mandy Gooch will discuss best practices for data management and the tools, people, and resources the GW Libraries provide to help you.
Geographic Information Systems (GIS) Data Basics
Learn how you can integrate geographic information systems (GIS) into your research and discover the resources available at the GW Libraries and beyond. This workshop will cover the basics of data discovery and display using ArcGIS software. Let it spark your cartographic imagination!
If you’ve searched the GW Libraries catalog lately, you might have noticed that we’ve increased the number of e-books available to GW users. This is all part of our efforts to make the material you need available when you need it. Almost all of our e-books can be read online in a web browser as well as downloaded and read on a computer or device. Just click the “Online” button to begin, and log in as you would to any other GW Libraries resource.
Prefer a print book? Look for the option to “Request a Print Copy,” which is located underneath the Online button. Click this link, fill out and submit a brief form, and GW Libraries will use library funds to purchase a print copy for the collection. (On the form, you can request that the print book be placed on hold for you when it arrives.)
Visit our website for more information on using GW Libraries e-books.
The GW Libraries are proud to announce a new service to support digital scholarship at GW: Programming & Software Development Consultation Services. Assistance is available from professional software developers to GW students, faculty, and staff who are working on an academic or scholarly inquiry which requires coding. Ask questions and get hands-on assistance with:
Coding, software development, scripting, and programming
Code review and debugging
Working with data markup and encoding (e.g., XML, JSON, CSV, RDF)
Retrieving data from websites and APIs
Data cleansing and manipulation
Databases (e.g., table design, querying, optimizing, loading)
Use our convenient Research Calendar to schedule an appointment with anyone labeled "coding/programming help." You may also email email@example.com for additional appointment times. Appointments are available both in-person and via WebEx. Learn more about these consultation services and see a list of programming languages, databases, and other areas of special expertise at go.gwu.edu/coding.
A statement from University Librarian and Vice Provost Geneva Henry:
As you may have already read in GW Today, GW’s Interim Provost, Forrest Maltzman, has announced a realignment of his office, consolidating academic technologies, the eDesign shop, and the university teaching and learning center under my leadership.
Pulling these units together is an excellent opportunity to seamlessly meet the instructional needs of our faculty. This deeper collaboration between previously separate areas will benefit all of our students. We look forward to the many possibilities this realignment has to further quality teaching and academic excellence at GW.
I began my career as a programmer and IT architect working with organizations like NASA and IBM’s Higher Education Industry group, but I found my passion at the intersection of technology and information. I’ve spent the past 15 years exploring and building many of the tools used for digital scholarship and look forward to this new opportunity to expand the tools available at GW, both in the classroom and in the libraries.
I am especially excited to return to working with online education, an area in which I played a leadership role at Rice University where we were pioneers in open education in the early 2000’s. Online education is built around systems that IT architects design, such as servers and websites that can scale to support streaming audio and video for online education. But fundamental to all successful courses is the instruction and course plan of the faculty members. During my years with the Connexions project and in collaboration with the OpenCourseWare project at MIT, I’ve seen how the quality of online materials and the ability to reliably deliver them worldwide enhances the teaching and learning experiences for all of our students.
I care deeply about providing the information, technology, and pedagogical resources needed for excellence in research and instruction here at GW. I look forward to continuing our partnership with faculty, students and staff to make sure our students have the best possible experience at GW.
In 1963, NEA teamed up with Hollywood to create Mr. Novak. The show was about an idealistic young high school teacher, played by James Franciscus, facing problems many teachers would recognize. As producer E.
With winter now making its appearance, we look for other sources of warmth in these sometimes-dreary months. Artists’ books from the Art & Design Collection from the Corcoran showcase color and color imagery in the pages of their work. Some stories are told completely through color; others, though they might use muted palettes, create a sensation with words that paint images of colorful scenes. These bright pages serve as a complement to the exhibition Color Bloc: Paintings by Elizabeth Osborne, on view in the Luther W. Brady Art Gallery through February 26, 2016.
The exhibit displays only a sampling of artists’ books from the Art & Design Collection from the Corcoran. These and many others can be viewed in the Special Collections Research Center at on the 7th foor of Gelman Library.
Coloring Pages runs through March 25, 2016, in the 2nd floor display cases of the School of Media and Public Affairs during regular building hours.
Friday, February 12
12:30 - 3 p.m.
Please RSVP at go.gwu.edu/GWdoesDH
Everyone is invited to a showcase of Digital Humanities (DH) projects underway across the University. The program will include brief presentations followed by discussion and a reception. Find out about innovative endeavors happening in Classics, The Elliot School, Corcoran School of the Arts and Design, Philosophy, Statistics, Health Sciences, DC Africana Archives Project, and more. Presented by the GW Digital Humanities Institute and GW Libraries, with opening remarks by Associate Professor of History Diane Cline, Director of Cross Disciplinary Collaboration and the XD@GW Faculty Cooperative.
Gelman Library will close on Saturday, January 30, from 1 a.m. - 9 a.m. and 24-hour building access will be unavailable during this time. This closure is required to safely X-ray the building as part of construction activities for the National Churchill Library and Center on Gelman’s 1st floor. The building must be completely vacant to ensure complete protection from radiation exposure. Surrounding streets and sidewalks will not be affected.
Due to adverse weather, the GW Libraries (Gelman, Eckles, and the Virginia Science and Technology Campus Libraries) will close at 3 p.m. on Friday, January 22.
Gelman Library will remain closed all of Saturday, January 23, and will reopen on Sunday, January 24 from noon - 8:00 p.m.
Eckles Library will reopen from 10 a.m.- 6 p.m. on Saturday, January 23 and from 10 a.m. - 10 p.m. on Sunday, January 24.
Power outages are predicted with this storm and may impact library hours. Please check library.gwu.edu for updated information before attempting to visit a library this weekend.
The latest in our social media harvesting experiments for the Social Feed Manager project involves analysis, discovery, and visualization of social media content. An analytics service may help satisfy two needs:
- 1. For the collection creator, being able to evaluate the content that is being collected so as to adjust the collection criteria. For example, for Twitter a collection creator may discover additional hashtags to collect. Since a collection creator may be collecting a rapidly evolving event, this requires near real-time analysis.
- 2. For the researcher, being able to analyze the content. Though many researchers will need to export the social media content for use with other tools, having available some sort of an analytics service may meet the needs of some researchers and may lower the barrier to performing social media research.
We also wanted to test the extensibility of the SFM architecture to make sure that additional services can be readily added.
The ELK (Elasticsearch, Logstash, Kibana) stack was selected for this experiment. It was selected primarily on the intuition that it was a good fit, rather than an analysis of its features or a comparison against other options. For those not familiar with this stack, Kibana is the discovery and visualization interface, Elasticsearch is the data store, and Logstash loads Elasticsearch with data. We’ll refer to our own implementation as SFM-ELK.
In SFM infrastructure, harvesters, such as the Twitter harvester, invoke the APIs of social media platforms and record the results in WARC files. Harvesters publish warc_created messages to a message queue whenever a WARC file is created. This provides the critical hook for SFM-ELK to perform loading -- a message consumer application listens for warc_created messages. When it receives a warc_created message, it:
- 1. Invokes the appropriate WARC iterator (e.g., TwitterRestWarcIter) to read the WARC file and output the social media records as line-oriented JSON.
- 2. Pipes this to jq, which filters the JSON. Most types of social media records contain extraneous metadata which do not need to be indexed in Elasticsearch. Logstash supports various mechanisms for filtering and transforming loaded data, but jq proved better for JSON data.
- 3. Pipes this into Logstash, which loads it into Elasticsearch.
Once properly loaded into Elasticsearch, the data is available for discovery and visualization using Kibana. Note that additional data is loaded as new WARC files are created.
For the purposes of this experiment, data harvested from Twitter’s search API using the search terms "gwu" and "gelman" was used.
While understanding the full power and flexibility of Kibana involves a significant learning curve, some of the functionality is readily usable. For example, to discover the tweets mentioning GWU’s President Knapp, enter “knapp” in the search box on the Discover screen:
or to find tweets posted by @gelmanlibrary:
Kibana allows you to easily adjust the timeframe of any discovery or visualization:
To demonstrate the sort of visualizations that might be useful for a collection creator or researcher, we created a Twitter dashboard:
Here’s each of those visualizations in a more readable size:
Note that the dashboard is periodically refreshed as new data is added.
As should be evident, this experiment barely scratches the surface of the capabilities of the ELK stack, or more generally, the potential of adding an analytics service to Social Feed Manager. The code for SFM-ELK is available at https://github.com/gwu-libraries/sfm-elk. Instructions are provided to bring up a Docker environment so that you can give it a try yourself. Keep in mind that this is only a proof-of-concept and it is not currently in scope of SFM development.
If any of this is of interest to you or your organization, collaborators are welcome.
P.S. It was just announced that Washington University in St. Louis, the Maryland Institute for Technology in the Humanities (MITH) at the University of Maryland, and the University of California, Riverside were awarded a Mellon grant for a project titled "Documenting the Now: Supporting Scholarly Use and Preservation of Social Media Content." Since there’s a clear need to support researchers' and archivists' needs for good analytical tools, we look forward to their work. Follow the project at @documentnow.