New work also reveals the heightened importance of archived social media datasets that make it possible for researchers to re-use data. In order for this data to be useful, it must be curated and preserved with sufficient metadata to explain the conditions of its original capture and any subsequent actions taken to refine the data. For instance, a researcher may remove a particular hashtag or account as a study progresses, changing the resulting dataset. Archivists face a new mandate to develop tools and practices that support these conditions for re-use and reproducibility.
The Social Feed Manager team has heard this loud and clear! The need to keep track of changes to collection criteria (seeds, harvesting options, credentials, etc.) is reflected in our user stories for the new Social Feed Manager and initial support should be included in our next release (version 0.5.0). You can follow progress by watching the ticket. (Keep in mind that we are still pre-version 1.0, so SFM is in active development.)
We haven’t work on the UI yet, but this should give you an idea of how this feature works. First I created a new seed set. (This is an action that might be performed by a researcher or an archivist.) In SFM, a seed set is a list of seeds for a harvest, where a seed might be a Twitter handle or a Flickr user. Since the list is in reverse chronological order, the entry for creating the seed set is second. Second, I changed the schedule of the harvest. This is the first entry below.
Notice that whenever a change is made, the following is recorded:
- Each field and value that is changed. In this example, the schedule was changed.
- Who made this change. In this example, “justin” made the change.
- When the change was made.
- An optional note describing the reason for the change.
Again, the UI work is still to be done, but you can imagine an (understandable) version of these changes appearing when a user is reviewing a seed set.
Note that this change history is also tied into how we keep track of harvests -- SFM records the exact state of the collection criteria used to perform the harvest.
For those wondering, this is implemented with django-simple-history.
If you have thoughts on this feature, comments are welcome. In particular, we’re interested in ideas about how to make this information available and useful to researchers, especially in dataset exports. I can be reached @justin_littman or the whole team at sfm-dev.
Colin Reagle, assistant professor, Mechanical Engineering, Volgenau School of Engineering, presents the 2016 Fenwick Fellow Lecture on Wednesday, March 30 in the Main Reading Room, Fenwick Library at 2 p.m. Professor Reagle will present the findings of his research project The Role of Renewables in George Mason’s Future Energy Portfolio in which he examined the hurdles the university faces toward reaching the 2025 Virginia renewable energy mandate’s statewide goal of 15 percent, and then exceeding the state’s minimum beyond 2025. In addition, with this study, he plans to provide a “roadmap for other regional institutions that consume power on a large scale to diversify their energy portfolios in a responsible manner.” Please join us at this Fenwick Library Grand Opening Week Event!
For more information about the Fenwick Fellow lecture and/or the fellowship program, please contact Diana Tippett, 703-993-2223, firstname.lastname@example.org.
During Fenwick Library’s Grand Opening Celebration Week, the University Libraries are initiating a new program: the Mason Author Series. Featuring the work of Mason scholars, the inaugural author is Giorgio Ascoli, University Professor, Molecular Neuroscience Department and founding Director of the Center for Neural Informatics, Krasnow Institute for Advanced Study. Professor Ascoli will discuss his book, Trees of the Brain, Roots of the Mind. The Mason Author Series is sponsored by the George Mason University Bookstore and coordinated by the Libraries’ Mason Publishing Group. Join us on March 29, 2:30 p.m. in the Fenwick Library Main Reading Room for the first Mason Author Series event. Light refreshments will be served.
Mason 4-VA, in collaboration with the University Libraries and Mason Online, invites Mason faculty to submit a proposal for innovative redesign of a course that integrates digital (and accessible) materials. That is, you supplant expensive textbooks either with digital works that you create, or with existing digital content that is in the public domain, licensed Creative Commons, or available in databases to which the University Libraries subscribes. To that end, you are reducing the cost of instruction to students and improving learning outcomes.
Courses of particular interest are those that:
- have high enrollment,
- are required for majors,
- count in the Mason Core, or
- carry high textbook costs.
This initiative is a Mason 4-VA pilot project. Any Mason full-time instructional faculty who teach high demand, heavily populated courses are eligible to apply, as are adjunct faculty who are part of a team proposal. Competitive grants will be awarded ranging from $1500-$5000, depending on the nature of the work and the level of team collaboration. Proposals due: March 18, 2016. Award notification: April 4, 2016.
Library faculty are poised to assist you with locating quality OER content, as well as answering questions related to copyright and Creative Commons licensing of your own materials. Mason Publishing Group, a department of the University Libraries, is available to aid faculty in developing OER textbooks or workbooks as a part of this pilot project. Let us know how we may help you! Contact your subject librarian, Claudia Holland (email@example.com), or John Warren (firstname.lastname@example.org), Head, Mason Publishing.
Join George Mason University Libraries in celebrating the new Fenwick Library addition.The highlight of the week’s events will be the Grand Opening Celebration on Thursday, March 31 at 4:30 p.m. in the Fenwick Library Atrium. You are cordially invited to attend! Join us and #CelebrateFenwick2016!
March 25-31 Exhibit: Mason Collections of Distinction
Special Collections Research Center and Fenwick Gallery
Floors 1 and 2, Fenwick Library
March 29, 2:30 p.m.
Mason Author Series
Main Reading Room
This new series features the work of Mason scholars. Giorgio Ascoli, University Professor, Molecular Neuroscience Department and founding Director of the Center for Neural Informatics, Krasnow Institute for Advanced Study, is the inaugural author. Professor Ascoli will discuss his book, Trees of the Brain, Roots of the Mind. The Mason Author Series is sponsored by the George Mason University Bookstore. Light refreshments will be served.
March 30, 2 p.m.
2016 Fenwick Fellow Lecture
Main Reading Room
Colin Reagle, Assistant Professor, Mechanical Engineering, presents findings from his Fenwick Fellow research project, The Role of Renewables in George Mason’s Future Energy Portfolio. Light refreshments will be served.
March 31, 4:30 p.m.
Grand Opening Celebration
Fenwick Library Atrium
Join University President Ángel Cabrera, Rector Tom Davis, and Members of the George Mason University Board of Visitors at the Grand Opening Celebration of the new Fenwick Library. The Keynote Speaker is Brian Lamb, Founder and Chairman, Cable-Satellite Public Affairs Networks (C-SPAN). Light refreshments will be served.
On Feb 1, GW’s Expert Finder launched. Expert Finder is an implementation of VIVO, a researcher discovery platform. The project is a collaboration between the Division of Information Technology and GW Libraries. As one of the software developers on the project, I want to take this opportunity to discuss some noteworthy aspects of our implementation. In particular, we have made some choices that emphasized rapidly deploying VIVO using a minimal amount of resources.
In any VIVO implementation, data is aggregated from a number of existing sources, including campus information systems, researcher information systems, and human resource systems. At GW, the majority of our data comes from Banner, our campus information system, and Lyterati, a faculty management system.
With this sort of data, various quality issues are to be expected. At GW, our data is no different. Here are some examples:
- My position is listed as “Uv Librarian FT” (a full-time librarian, I think) in “UN LIB TECH/RESEARCH SRVCS” (Gelman Library).
- My alma mater, Amherst College, has entries for: Amherst College; Amherst College, Amherst, Massachusetts; Amherst College, Amherst, MA; and Amherst College, Amherst MA. You can imagine how many variations there are for GW!
- The full citation for a journal article is “The Contraceptive Mandate and Religious Liberty. Pew Forum on Religion in Publick Life. 2013.” Notice the misspelling and lack of co-authors, issue, volume, pages, and a DOI.
VIVO implementers employ a variety of strategies to mitigate data quality issues. These strategies include:
- Fixing the existing data source so that it produces cleaner data.
- Creating a new data source. For example, many VIVO implementers also implement Symplectic Elements to provide a reliable source for faculty publications.
- Have researchers correct their own data either in the source system or directly in VIVO.
- Data cleansing and normalization, both manually performed by a person and automated by software.
More than data cleansing and normalization, many VIVO implementers also perform data enhancement. Data enhancement involves collecting additional data to improve upon the existing data. So, for example, some institutions get a complete citation for journal articles by looking them up in Crossref or disambiguate article authorship using Harvard’s Disambiguation Engine.
For GW’s implementation, we have also employed some of these strategies to mitigate data quality issues. In particular, we have:
- Worked with our partners at Entigence to make changes to Lyterati that encourage cleaner data entry by faculty.
- Placed a heavy emphasis on researchers correcting their own data in the source system.
- Created an interface for non-faculty staff to enter some of their data.
Notably absent from this list is data cleansing, normalization, or enhancements. Other than some minor fixes like the format of phone numbers, the data is loaded exactly as received from the source systems. This is deliberate, as the project charter for Expert Finder specifically excludes “cleanup of data.” The strategy of not performing data cleansing, normalization, or enhancements trades off a significant savings in resources (both for initial implementation and on an ongoing basis) for some data quality.
We use the strategy of suppressing some data in the VIVO interface to partially compensate for not cleaning or normalizing the data,. In particular, we remove links and lists where the less clean data would be revealed and/or prevent VIVO from working properly. So, for example, publications are listed on a researcher’s detail page, but the publications are not linked to the publication’s detail page. Also, a user can get to the list of organizations, but the lists of people, research, and events has been removed. This approach does a reasonable job of presenting the data for a researcher, but obviously at the expense of reduced functionality for discovery.
Also worth drawing attention to is our approach for non-faculty staff. GW faculty publication, education, and funding is collected in Lyterati, but there is no existing system for non-faculty staff (like us librarians). Rather than create a new system for this, we are asking non-faculty staff to create and populate ORCID records. Using orcid2vivo, we then retrieve this information using ORCID’s API and load into VIVO. The data from ORCID tends to be very high quality. For an example of a researcher detail page with data from ORCID, see https://expert.gwu.edu/display/justinlittman.
These approaches have allowed GW to rollout our Expert Finder in a relatively short timeframe (about a year) with a minimal amount of resources (part of a project manager and business analyst and a fraction of 3 software developers and a sys admin). We look forward to feedback from the GW community, as well as the opportunity to assess Expert Finder based on actual usage by users. This will allow us to make adjustments as necessary so that Expert Finder can showcase the expertise at GW and enable collaboration.