On Feb 1, GW’s Expert Finder launched. Expert Finder is an implementation of VIVO, a researcher discovery platform. The project is a collaboration between the Division of Information Technology and GW Libraries. As one of the software developers on the project, I want to take this opportunity to discuss some noteworthy aspects of our implementation. In particular, we have made some choices that emphasized rapidly deploying VIVO using a minimal amount of resources.
In any VIVO implementation, data is aggregated from a number of existing sources, including campus information systems, researcher information systems, and human resource systems. At GW, the majority of our data comes from Banner, our campus information system, and Lyterati, a faculty management system.
With this sort of data, various quality issues are to be expected. At GW, our data is no different. Here are some examples:
- My position is listed as “Uv Librarian FT” (a full-time librarian, I think) in “UN LIB TECH/RESEARCH SRVCS” (Gelman Library).
- My alma mater, Amherst College, has entries for: Amherst College; Amherst College, Amherst, Massachusetts; Amherst College, Amherst, MA; and Amherst College, Amherst MA. You can imagine how many variations there are for GW!
- The full citation for a journal article is “The Contraceptive Mandate and Religious Liberty. Pew Forum on Religion in Publick Life. 2013.” Notice the misspelling and lack of co-authors, issue, volume, pages, and a DOI.
VIVO implementers employ a variety of strategies to mitigate data quality issues. These strategies include:
- Fixing the existing data source so that it produces cleaner data.
- Creating a new data source. For example, many VIVO implementers also implement Symplectic Elements to provide a reliable source for faculty publications.
- Have researchers correct their own data either in the source system or directly in VIVO.
- Data cleansing and normalization, both manually performed by a person and automated by software.
More than data cleansing and normalization, many VIVO implementers also perform data enhancement. Data enhancement involves collecting additional data to improve upon the existing data. So, for example, some institutions get a complete citation for journal articles by looking them up in Crossref or disambiguate article authorship using Harvard’s Disambiguation Engine.
For GW’s implementation, we have also employed some of these strategies to mitigate data quality issues. In particular, we have:
- Worked with our partners at Entigence to make changes to Lyterati that encourage cleaner data entry by faculty.
- Placed a heavy emphasis on researchers correcting their own data in the source system.
- Created an interface for non-faculty staff to enter some of their data.
Notably absent from this list is data cleansing, normalization, or enhancements. Other than some minor fixes like the format of phone numbers, the data is loaded exactly as received from the source systems. This is deliberate, as the project charter for Expert Finder specifically excludes “cleanup of data.” The strategy of not performing data cleansing, normalization, or enhancements trades off a significant savings in resources (both for initial implementation and on an ongoing basis) for some data quality.
We use the strategy of suppressing some data in the VIVO interface to partially compensate for not cleaning or normalizing the data,. In particular, we remove links and lists where the less clean data would be revealed and/or prevent VIVO from working properly. So, for example, publications are listed on a researcher’s detail page, but the publications are not linked to the publication’s detail page. Also, a user can get to the list of organizations, but the lists of people, research, and events has been removed. This approach does a reasonable job of presenting the data for a researcher, but obviously at the expense of reduced functionality for discovery.
Also worth drawing attention to is our approach for non-faculty staff. GW faculty publication, education, and funding is collected in Lyterati, but there is no existing system for non-faculty staff (like us librarians). Rather than create a new system for this, we are asking non-faculty staff to create and populate ORCID records. Using orcid2vivo, we then retrieve this information using ORCID’s API and load into VIVO. The data from ORCID tends to be very high quality. For an example of a researcher detail page with data from ORCID, see https://expert.gwu.edu/display/justinlittman.
These approaches have allowed GW to rollout our Expert Finder in a relatively short timeframe (about a year) with a minimal amount of resources (part of a project manager and business analyst and a fraction of 3 software developers and a sys admin). We look forward to feedback from the GW community, as well as the opportunity to assess Expert Finder based on actual usage by users. This will allow us to make adjustments as necessary so that Expert Finder can showcase the expertise at GW and enable collaboration.
“Managing Relations with an Increasingly Assertive Russia”: A Foreign Policy Discussion with Ambassador Thomas Pickering
Thursday, March 24
2:00pm to 3:30pm
Gelman Library, Room 219
Under the combined effects of diminished university revenues and predatory inflation by publishers, academic libraries everywhere are facing serious financial constraints. Join us for a presentation and discussion of the GW Libraries' collection strategy, and learn more about how the Libraries use data in managing collections across the disciplines.
The weather may not feel like spring, but the calendar says it is Spring Break! For those of you spending your break in the warm embrace of Gelman Library, please note our change in hours. The library will not be available for 24-hour study during Spring Break (March 12-19).
Friday, March 11
Close at 10pm
Saturday & Sunday, March 12 & 13
Monday-Friday, March 14-17
Saturday, March 19
Sunday, March 20
Open at 9am and 24-hour study resumes