WRLC Libraries

Study Modernism in Paris this Summer

The George Washington University - Wed, 01/06/2016 - 14:46
January 6, 2016

Explore the journey of Picasso, Diaghilev, Kertesz, Stravinsky, & others who forged artistic collaborations and established Paris as the center of Modernist thought in the early 20th century.  

Visiting museums, touring iconic architectural sites and viewing contemporary performance spaces, we will measure today's art against the past.

Learn more at www.gwuparis.com or contact Professor Mary Buckley or Librarian Bill Gillis.

June 1-14, 2016
Paris: Modernism and the Arts, Then and Now —TRDA 4595w
No language requirements   
3 credits, WID, Elliot School and Cultural Studies Course Humanities GCR

The Sound of the Library at Work

The George Washington University - Sun, 01/03/2016 - 19:46
January 5, 2016Laura Wrubel

At the Access Conference in Toronto in September 2015, I attended an all-day hackfest on data sonification, led by William Denton of York University and Katie Legere of Queen’s University. Data sonification is the translation of data into sound, much as data visualization transforms data into a graph or image. You can read about the workshop and see some examples of data sonification at Music, Code and Data: Hackfest and Happening at Access 2015.

In brief, it was a fast, fun, and practical introduction to both data sonification and the freely available Sonic Pi synthesizer software. Everyone who attended made some kind of music using a data file they brought with them or from provided sample files. The fact that we were able to get so far in a day speaks to both the skill of our hackfest leaders and the ease with which you can make sound with Sonic Pi.

I’d like to describe a few experiments, one from this workshop and another more recently, that have me excited about data sonification.

Music from the circulation desk

I brought to the hackfest a csv file with the number of circulation transactions each day for a year--July through June--created from our Voyager system’s circulation transactions logs. The values ranged from roughly 100-1000. With such a broad range, I chunked up the values into a smaller set from 10-100 corresponding to Sonic Pi’s numbering of notes as on a piano keyboard. However, the pitches were so wildly dispersed, that while it was easy to hear outliers, the arc of activity through the semester was hard to perceive. The notes just didn’t make sense to my ears.

To provide a more listenable-- and I’m hoping, more meaningful--line, I assigned the chunks of values to specific notes in the C major scale, across two octaves. This is a much smaller range than the first version, and the notes feel more coherent, being in the same key. 

Thinking there may be patterns in the volume of activity within the week, I added underlying drum beats to emphasize the first day of each week, Sunday. Finally, lighter beats accompany each note during the semester, underscoring the quiet in library activity during semester breaks.

You can listen to it here:

Github beats

More recently, I worked with my colleague Dan Chudnov to make visible to the library staff the activity of our team, the Scholarly Technology Group. The steady work of creating and maintaining software to help our user community often simply looks like us working away on our computers with our heads down. Dan created a visualization of our team’s work as expressed in commits to our projects’ repositories on Github. We also wondered what our team’s work might sound like.

I focused on one software project, an interface to our catalog data and other APIs for discovery; we call it “Launchpad” internally. It’s something quite a number of us have worked on over the past three years. I started with a file which listed one Github commit per line, including the name of the file changed and the person making the change. I then assigned a pitch to each person on the team, all within the same key, giving the initial project manager (Dan) and current project manager (Michael Cummings) the tonic note to provide some centering. To add a sense of time passing, I added drum beats, with a sound sample for the two major rollouts of Launchpad to our user community.

You can listen to it and watch the supplementary logging within Sonic Pi on YouTube (best viewed fullscreen):

You can hear how the project started with three core developers, who worked intensively on the project through its first roll-out. Over time, participation broadened to a larger group, and each new person’s entry to the project is audible. It bears acknowledging that we’re hearing only a slice of the activity in the creation of Launchpad; this project had considerable contributions from others who represented end users, participated in testing, wrote documentation, conducted usability testing, and performed analyses to inform feature development.

A few observations

In each of these experiments, the first few iterations were not pleasant to listen to. I struggled to make sense of the noise, lacking anything to latch onto, the audible equivalent of X and Y axes. A little knowledge of music goes a long way in providing some structure that our ears are trained to recognize: rhythm, key, tempo.

As in creating a visualization, aesthetic choices in sonification can interfere with accuracy. Even in these small experiments, I wrestled with choices that mute, in a sense, aspects of the data and could mislead a listener trying to understand the data. For example, in the Github sonification, I chose to represent the activity across time uniformly, one beat per commit. Obviously, the work was not evenly distributed across three years; the pace and changes in work intensity don’t come across in this sonification. 

When it comes to coding, Sonic Pi was an easy entry point to making music from data, particularly when you don’t have a live orchestra at hand. The tips in William Denton’s blog post about reading csv files helped get me started, along with Sonic Pi’s built-in tutorial. Beyond Sonic Pi, there are many other software and tools to support data sonification; I’d be interested to hear what others have tried and found useful.

I'd also like to explore the growing cross-disciplinary literature on data sonification. Other examples of applying data sonification to library data include Denton's STAPLR experiment and Legere's research on using sonification to inform real-time library management decisions. In the end, these pieces were fun to create and made my colleagues’ work apparent in a new way.  There’s something satisfying about hearing your work turned into music.


Gelman Library to House Winston Churchill’s World War II Engagement Diary

The George Washington University - Wed, 12/16/2015 - 10:06
December 16, 2015Construction of the National Churchill Library and Center to Begin this Month

A collection of handwritten cards detailing Winston Churchill’s appointments during World War II, including such historic events as Victory in Europe (VE) Day and the British prime minister’s regular meetings with the King of England and President Franklin Roosevelt, will have a new home at the George Washington University. The “engagement diary” will be featured in the new National Churchill Library and Center to be located at GW.

Steve Forbes, chairman of Forbes Media and a Churchill enthusiast, donated the collection of 30 cards to the Chicago-based Churchill Centre. The collection was then given to GW’s Estelle and Melvin Gelman Library for use in the National Churchill Library and Center, which begins construction in December.

“The engagement diary is an important historical resource, and I am pleased that they will now be seen by a broad audience,” said Mr. Forbes. “I join Churchillians everywhere in applauding The Churchill Centre’s initiative to partner with GW to create a permanent home for Churchill scholarship, studies and education in the heart of our nation’s capital.”

Privately held since the end of World War II, the cards are a source for the history of Mr. Churchill’s wartime leadership, recording the extraordinary extent of his activities and the frequency and range of his wartime journeys. Between September 1939 and June 1945, Mr. Churchill’s private secretaries kept the handwritten “engagement diary” on two-sided cards measuring 12 by 13 inches. The library has created high-resolution digital images of the cards and will launch a crowdsourcing project, open to the public, to provide full text transcription and annotation for the cards, all of which will be available to the public on a dedicated website. 

“We are delighted to receive this fantastic record that gives us a window into part of Winston Churchill’s life during World War II,” said Geneva Henry, university librarian and vice provost for libraries. “The gift coincides with the construction of the National Churchill Library and Center, the first permanent U.S. home in our nation’s capital for the study of Winston Churchill.” 

The National Churchill Library and Center, which is expected to open in 2016, will educate new generations about Mr. Churchill and will serve as a classroom and meeting space for public programs and lectures highlighting the historical significance of Mr. Churchill, his contemporaries and more recent world leaders. 

“We are honored that Steve Forbes has entrusted us with these historic documents, and we are glad that they will be a part of the National Churchill Library and Center at GW,” said Lee Pollock, executive director of the Churchill Centre. “For the first time, the original record of Churchill’s wartime activities will be made freely and widely available to scholars and students around the world.”

The library will work with academic programs across the university to develop programming.  


About the National Churchill Library and Center

The National Churchill Library and Center is part of a philanthropic partnership with the George Washington University and the Chicago-based Churchill Centre. Housed on the first floor of the Estelle and Melvin Gelman Library, this will be the first major research facility in the nation’s capital dedicated to the study of Winston Churchill.

MEDIA CONTACTS:br /> Kurie Fitzgerald: kfitzgerald@gwu.edu, 202-994-6461
Emily Grebenstein: egrebenstein@gwu.edu, 202-994-3087

Harvesting the Twitter Streaming API to WARC files

The George Washington University - Tue, 12/15/2015 - 08:54
December 15, 2015

The Twitter Streaming API is very powerful, allowing harvesting tweets not readily available from the other APIs. However, recall from our previous post that the Twitter Streaming API does not behave like REST APIs that are typical of social media platforms -- see Twitter’s description of the differences. A single HTTP response is potentially huge and may be collected over the course of hours, days, or weeks. This is a poor fit for both the normal web harvesting model in which a single HTTP response is recorded as a single WARC response record in a single WARC file, and for most web archiving tools, which store HTTP responses in-memory and don’t write them to the WARC file until the response is completed.

This post describes an approach we’ve developed for harvesting the Twitter Streaming API and recording in WARC files. We will also show how the tweets can be extracted from the WARC files for use by a researcher.

The Twitter Streaming API is not the only form of streaming content on the Web and the authors of WARC Specification had the forethought to support record segmentation. In record segmentation, a single HTTP response is split into multiple WARC records, potentially in multiple WARC files. The first record is a WARC response record; subsequent records are WARC continuation records. The header of the final continuation record also contains the total number of bytes of the entire HTTP response.

While WARC record segmentation is theoretically a good solution for the Twitter Streaming API, record segmentation is not widely supported in most web archiving tools. Our first step was to modify Internet Archive’s warcprox to support record segmentation. (Our pull request is #15. The crux of the change is between lines 210 and 245 in warcprox.py.) Recall from the earlier post that warcprox is an HTTP proxy that records the HTTP transaction in a WARC.

The following shows snippets from a WARC file created by the modified warcprox from the Twitter filter API retrieved by twarc tracking “obama”. It consists of a WARC response record, a request record, a continuation record, and a final continuation record.

WARC/1.0 WARC-Type: response WARC-Record-ID: <urn:uuid:9aff4bf7-d64a-411c-9ef8-cd82778e036e> WARC-Date: 2015-12-02T16:59:07Z WARC-Target-URI: https://stream.twitter.com/1.1/statuses/filter.json WARC-IP-Address: Content-Type: application/http;msgtype=response WARC-Segment-Number: 1 Content-Length: 1149 WARC-Block-Digest: sha1:7c8de1bd439cf62c67f9f4b0c48e6f3ae39eb4ef WARC-Payload-Digest: sha1:cc1b7bf9a2945ddf8ae7c35d5f05513d0d8b691b HTTP/1.1 200 OK connection: close content-Encoding: gzip content-type: application/json date: Wed, 02 Dec 2015 16:59:07 GMT server: tsa transfer-encoding: chunked x-connection-hash: 8439cf557d0f807635797377d9e7d0b6 a ? 1f1 tSۊ?0??A/}?%??ر??^???¶??P?q#"KF??n??w?ٔ%?O3?͜?y`?GQ    Y?~?????!+?U?? ^r? ?ي?bZ???r^WeU?_?:[?ѓ??$?"?I?7????1`?ہ?;?oH?}?a?v?.?ε                                                         }???F???t??|???N??????m?i?t??9? ??1???B?c?A?<?;a?/???&?d?dkziR?Vxͽ????q                                                ??8?څ??;?Z "?c'c?$g????? ????     4???ʁ|???5?Y-k???z???9FM?<v{?v픗2K>_?2!??d????q?v???E?{|??ct???=???=n??_E IQ?'? U?&??]???n?ֽ??"?(:*?6,???F??????4:?%?? ?=-??x?-ל????EQ????N>?????VOW???c'\???^gk?Z=???lZ???y?? 163 ?U?n?0???C?^??Æ^ =?T?)?4X_U????7~T?75??~Q?˵Ғ1??????`"????c?wfgR?`?g???kp<???r)+. ?4zD?????ie6?/F????˭*???   Xm??rLhEiƈs???B)y???b;a??Am??d׮?<??ԍNȄ?$????T?r?ϝ,ot?m???L???                         ?j4??.??Q??b???%????7?????????7??XT?2B%?,aQ?4I?p?ž?wn?z                                                                                 ??\??7`                                                                                        R{Z???8?Ϲ<?$?t??)u?^?5?u?{}?K??yOo?]?(??.f??|??m???? 229 [o?0???'q???6??-J?.?z@k'??IL@?? WARC/1.0 WARC-Type: request WARC-Record-ID: <urn:uuid:3a6ce873-13a9-401a-bfd9-3ddc321aab96> WARC-Date: 2015-12-02T16:59:07Z WARC-Target-URI: https://stream.twitter.com/1.1/statuses/filter.json WARC-Concurrent-To: <urn:uuid:9aff4bf7-d64a-411c-9ef8-cd82778e036e> WARC-Block-Digest: sha1:fa301cb54fd6c38adac4a43bacf36d38198ec8e0 Content-Type: application/http;msgtype=request Content-Length: 566 POST /1.1/statuses/filter.json HTTP/1.1 content-length: 30 accept-encoding: deflate, gzip host: stream.twitter.com accept: */* user-agent: python-requests/2.8.1 content-type: application/x-www-form-urlencoded authorization: OAuth oauth_nonce="149931870481283598461449075546", oauth_timestamp="1449075546", oauth_version="1.0", oauth_signature_method="HMAC-SHA1", oauth_consumer_key="EHdoTe7ksBgflP5nUalEfhaeo", oauth_token="481186914-c2yZjbk1np0Z5MWEFYYQKSQNFBXd8T9r4k90YkJl", oauth_signature="m0hHjrPnU7aTtOhjmk8om3Vv7Ok%3D" track=obama&stall_warning=True WARC/1.0 WARC-Type: continuation WARC-Record-ID: <urn:uuid:c18791da-24e0-42a7-91df-82dfdae6697e> WARC-Date: 2015-12-02T16:59:07Z WARC-Target-URI: https://stream.twitter.com/1.1/statuses/filter.json WARC-IP-Address: Content-Type: application/http;msgtype=response WARC-Segment-Number: 2 WARC-Segment-Origin-ID: <urn:uuid:9aff4bf7-d64a-411c-9ef8-cd82778e036e> Content-Length: 1220 WARC-Block-Digest: sha1:82794503724ba3bb06fee69302614a3f5ef00c39 ?????a??N?*M???_l???y"uU]IZ`RU1?/?n?V?`???&H??h?U??x??Ea j???mٌSjfsr¨??ê˽RN?&F'?<?h^H~ ?è?ـ                                                                                             ??m?@?'?]???:?sT?‡T?/S??W??t??]M???_??.???o?ҷa??Sn1???/?;Z;?+?PF??                                        $L?HnD?????x?t?|ľ?    ?    -G^?|?    "?????gr?? ? )?e[????{]vW???j???-??*T&?{)2\?9^?`\?_??>?.-????ҚO??{v?+?W??4??ps %c?8?'?`?nU???a??%?q?/q?о?X???&???G}71G?&V?                                                                                   ?w?ȱZn?ӯ?&?*C??&s?R???rRa???? ?j??es??q?@?s??\/7?w??v?????+???2(????????mNS? ?iZ?????p}?8?.?????????;?? 16c ̘AO?0ǿ      g?F˸??&?!?u???2D????&U?Ń'J?ڒ??????????K5??pBm?T??=)?0?                                                            8Ę?????Ԉ,?                                                                      O??>u?~???3?A???Ώho??[?rYV'??jW??J?e?IV?r?d?*L6    ;???????i/ R-       ??   ??Y?Cĭ??           ??2]vj ??7??C5B??????!?;????m(j???^?d/??jK??m?d?K ,???|P˂?ۥF2??5*%`Lﲞ?x\g????'qs?F?                                                                                                ?O?                                                                                                   ?=Ԥz`??k+?l?gS????                                                                                                                     qU?g#?S????3??SӕS???`2=HM?-? ??Ys?5S?O??? 68 ??    U??X?<???̀4?B???Q'Ԇ7(?!?S?፮?>F??^??????Rm,?A????r?<(e??:?28;?f???? 1a1 ??OO?@??&~    ?"?"??D?5?Lj6P?,?@K??    [ ?F?`????~? ???<?T5? ???%'ap,$?FCZ????vP???D‚?N?8p?-/???l[??y???#?{]??(?J????'E?&΃???զj???X??7?<Ɩg?ՅŸU?Bh%                                                                                                            m??u?h????????s?N??u????u??0֜d WARC/1.0 WARC-Type: continuation WARC-Record-ID: <urn:uuid:d7bfe010-7831-45a8-8361-715692ea014b> WARC-Date: 2015-12-02T16:59:09Z WARC-Target-URI: https://stream.twitter.com/1.1/statuses/filter.json WARC-IP-Address: Content-Type: application/http;msgtype=response WARC-Segment-Number: 3 WARC-Segment-Origin-ID: <urn:uuid:9aff4bf7-d64a-411c-9ef8-cd82778e036e> WARC-Segment-Total-Length: 924 WARC-Truncated: unspecified Content-Length: 307 WARC-Block-Digest: sha1:57b73cdaab8025cc04a83f3ae6eff2dd6e2bfa15 ?^,~0??Cc?43??n????8???????A^]d???ן&??qSN?FZ ??m?$p? ?&?A?p$?$?S??d,^zk?#?Y    ?q?g~????R????P?\???~??w??T?&`                                                               ????L?r????i????Th2?2B??$?C??:????T????? 20e tRMk?@?+??C]YV??T NqZHS?K/??F???Y?QE?|GVjB?u?a?y??͋(,J??Vz???X? ??̲i??)|???$?L?H?Rd?y???"

As should be obvious, this data is not readily usable by most researchers. In particular, there are four barriers to use:

In order to be confident in this approach, we feel it is prudent to make sure that we can access the tweets given these various barriers and the lack of support for record segmentation in web archiving tools. To this end, we developed TwitterStreamWarcIter and the parent class BaseWarcIter.  TwitterStreamWarcIter outputs the tweets from a WARC file, one per line. This is the same output as twarc or cat-ing a line-oriented json file and can be piped to other tools such as jq:

$ python twitter_stream_warc_iter.py test_1-20151202200525007-00000-30033-GLSS-F0G5RP-8000.warc.gz {"contributors": null, "truncated": false, "text": "RT @Litorodbujan: Obama quiere visitar Espa\u00f1a!\nAhora s\u00ed somo s un pa\u00eds serio; con Rajoy no se repetir\u00e1 esto.   #RajoyconPiqueras https://t.c\u2026", "is_quote_status": false,  "in_reply_to_status_id": null, "id": 672144412936445952, "favorite_count": 0, "source": "<a href=\"https://mobile.twitter. com\" rel=\"nofollow\">Mobile Web (M2)</a>", "retweeted": false, "coordinates": null, "timestamp_ms": "1449086690540", "ent ities": {"user_mentions": [{"id": 320317854, "indices": [3, 16], "id_str": "320317854", "screen_name": "Litorodbujan", "nam ....

or suitable for human-consumption with the --pretty flag:

$ python twitter_stream_warc_iter.py test_1-20151202200525007-00000-30033-GLSS-F0G5RP-8000.warc.gz --pretty {     "contributors": null,      "truncated": false,      "text": "RT @Litorodbujan: Obama quiere visitar Espa\u00f1a!\nAhora s\u00ed somos un pa\u00eds serio; con Rajoy no se repetir\u00e1 esto.   #RajoyconPiqueras https://t.c\u2026",      "is_quote_status": false,      "in_reply_to_status_id": null,      "id": 672144412936445952,      "favorite_count": 0,      "source": "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Mobile Web (M2)</a>",      "retweeted": false,      "coordinates": null,      "timestamp_ms": "1449086690540",      "entities": { ....

This approach addresses the WARC barrier by using Internet Archive’s WARC library to read the WARC file. The IA WARC library is extended to handle record segmentation by stitching the payload back together. (See CompositeFilePart. It still doesn’t handle continuations that are in other WARC files, but solving that problem is just software development.) And lastly, the content encoding and transfer encoding barriers are remedied by loading the payload into a urllib3 HTTPResponse which handles the decoding of the content encoding and transfer encoding, as well as providing a familiar, pythonic interface to the response.

As we have explored the similarity between web harvesting and social media harvesting, the Twitter Streaming API represents the point of greatest friction. However, the above represents a reasonable first approach to addressing the unique features of the Twitter Streaming API.

Addressing Temperature Complaints in Gelman

The George Washington University - Thu, 12/10/2015 - 13:31
December 10, 2015

We hear your complaints about the heat in Gelman and we are working with GW Facilities to get the temperatures under control! The building is currently experiencing areas of extreme heat (primarily the 3rd floor) and areas of extreme cold (mostly the 5th and 7th floors). Please continue to email gelman@gwu.edu or tweet @gelmanlibrary with reports of when & where you experience extreme temperatures in the building. These reports help the maintenance crew pinpoint and correct the problem.

We sincerely apologize for the inconvenience at this important time of the semester and having been working since the initial reports to provide more comfortable temperatures in the building. 

Please Respect Quiet Study Areas

The George Washington University - Wed, 12/02/2015 - 16:35
December 2, 2015

Space is tight here at Gelman, and we must respect each other's needs, whether for quiet study or collaborative work, especially at this time of the semester. Spaces throughout the building are designated for either quiet, individual study or collaborative, group study. Please follow the posted guidelines for each space. Together, we can wrap up another great semester and Raise High for Finals!

Quiet Study Spaces in Gelman
No group conversation allowed • Use headphones at a low volume  • Silence phones & electronic devices

3rd, 4th, 5th & 6th floor stacks
4th floor large study room (401)
5th floor large study room (501)
Graduate Student Reading Room (503)
Andrew Oliver Reading Room (609)

Group Study Spaces in Gelman
Group conversation allowed • Use headphones at a low volume  • Silence phones & electronic devices

1st floor (all areas)
Entrance floor (all areas)
4th floor large study room (403)
Reservable, small group, study rooms on floors 2,3,4,5 & 6

Winter Break Hours at the GW Libraries

The George Washington University - Wed, 12/02/2015 - 09:44
December 2, 2015

Gelman Library Winter Break Hours
Dec. 19: Close at 7pm & end 24-hour access
Dec. 20: CLOSED*
Dec. 21 — 23: 7am — 5pm*
Dec. 24 — Dec. 27: CLOSED*
Dec. 28 — 30: 7am — 5pm*
Dec. 31 — Jan. 3: CLOSED*
Jan. 4 — 8: 7am — 6pm*
Jan. 9 & 10: Noon — 6pm*
Jan. 11: Open at 7am & resume 24-hour access

Eckles Library Winter Break Hours
Dec.19: 10am — 5pm
Dec. 20 - Jan 3: CLOSED

VSTC Library Winter Break Hours
Dec. 24-25: VSTCL is CLOSED*
Dec. 31-Jan. 1:  -  VSTCL is CLOSED*
*24-hour building access is not available during this time.

GW Digital Humanities Showcase

The George Washington University - Tue, 12/01/2015 - 15:37
December 1, 2015

Showcase Date: February 12, 2016
Submissions for Presentations Due: January 10, 2016
Hosted by GW Digital Humanities Institute and GW Libraries

Are you launching a Digital Humanities (DH) project and figuring out the next steps? Do you want to meet other people at GW who are interested in how the arts and humanities interact with digital media?

We invite members of the GW community to join the second annual DH Showcase at Gelman Library. Each person (or team) will present a DH project or endeavor (in any stage of its production). This event will provide a venue to introduce your project to other people and receive feedback or advice while also making connections with people across the GW community who might share similar interests. We hope that new conversations will open up about methods, tools, challenges, questions, and possibilities arising across projects.

Our definition of DH is broad and can entail anything from a database or tool to a blog or creative work, and we welcome presentations integrating online media or digital cultures into teaching in (or beyond) the space of the classroom.  

If you are interested in taking part in this event, please contact Prof. Jonathan Hsy (Co-Director of the Digital Humanities Institute) with your name, email, affiliation/title, and title of project(s).  A one-paragraph blurb about your project is welcome but not required.



Support Every GW Student on #GivingTuesday

The George Washington University - Tue, 12/01/2015 - 09:43
December 1, 2015

You've shopped on Black Friday and Cyber Monday, but now it is #GivingTuesday, a chance to give back to your community. We invite you to make a gift that will impact every single student at GWU by giving to GW Libraries. Whether it is providing scholarly resources for research, making unique special collections accessible or offering a comfy chair in which to study, the Libraries support every student.  

When you give to GW on #GivingTuesday, a global day dedicated to giving back no matter where you choose to direct your support, you give the gift of education.

Give to GW here: http://go.gwu.edu/give2education

GW & Mount Vernon Yearbooks Online

The George Washington University - Mon, 11/23/2015 - 11:24
November 23, 2015

Yearbooks from GW and the Mount Vernon College and Seminary are now available online on archive.org. You can browse approximately 100 years of yearbooks for GW (1908 - 2009) and almost 90 years for Mount Vernon (1911 - 1998).

Scanning these yearbooks and making them available and accessible online is an on-going project of the GW Libraries' Special Collections Research Center and the University Archives. We will be adding additional yearbooks over time.

FREE TO ROCK: Rock Music & the End of Communism Panel Discussion

The George Washington University - Wed, 11/18/2015 - 16:30
November 18, 2015

Thursday, November 19
4 - 5:30 pm
Gelman Library, Room 702

Professor Richard Robin, moderator 
Valery Saifudinov, Founder of first Soviet Rock Band, The Revengers, and co-inventor of the first Soviet electric guitar 
Joanna Stingray, Soviet and Russian rock recording artist, producer, TV personality, and first American record producer of Soviet Rock bands
Dr. Mark Yoffe - Curator of the International Counterculture Archive and the Soviet Samizdat Archive in Gelman's Global Resources Center
William Levins, Student
Nick Binkley and Doug Yeager, Producers and researchers for the film FREE TO ROCK

This panel is presented in coordination with the premiere of the documentary FREE TO ROCK on Tuesday, November 17, 7:30 pm, Georgetown University, Gaston Hall

Hebrew Printing in the Orient Exhibit

The George Washington University - Tue, 11/17/2015 - 15:43
November 17, 2015

November 8, 2015 - July 1, 2016
Dr. Yehuda Nir and Dr. Bonnie Maslin Exhibit Hall & adjoining exhibit spaces
Gelman Library, 7th floor

A new exhibition of the Kiev Judaica Collection, "Hebrew Printing in the Orient" presents books and typography across a vast non-western panorama: from the Maghreb to the Far East, from Central Asia to India, and from Southern Africa to the Antipodes.The first such exhibit of this material in nearly 90 years, it traces the introduction of movable type outside of Europe by Jewish exiles from Spain, who established a Hebrew press at Constantinople (Istanbul) in 1493, through the establishment of presses at Salonika in Ottoman Greece (the earliest printing on the territory of Greece) and at Fez in Morocco (the first press on the continent of Africa). Examples of the subsequent spread of Hebrew printing in different parts of the Middle East and Asia are drawn from the holdings of the Kiev Collection.  Among the rarities are Hok le-Yisrael (Cairo, 1740), one of the first books ever printed in Egypt, and Zer‘a Yitshak (Tunis, 1768), the first book in any language printed in Tunisia.  Included in the display are texts in various languages using the Hebrew alphabet, such as Ladino (Judeo-Spanish), Judeo-Arabic, Judeo-Persian and Yiddish, apart from Hebrew and Aramaic.  

GW Libraries Thanksgiving Hours

The George Washington University - Mon, 11/09/2015 - 08:58
November 9, 2015

The GW Libraries are thankful for our terrific patrons (and a few days off!) Please note the building closures and changed hours for the Thanskgiving holiday.

Gelman Library Thanksgiving Hours:
Wednesday, Nov. 25  - Gelman building closes at 6pm*
Thursday, Nov. 26 & Friday, Nov. 27  -  Gelman is CLOSED*
Saturday, Nov, 28  -  Open from noon-6pm* 
Sunday, Nov. 29  -  Open at 9am to resume 24-hour access
*24-hour building access is not available during this time.

Eckles Library Thanksgiving Hours:
Tuesday, Nov. 24 - 8am-11pm
Wednesday, Nov. 25  -  8am-5pm
Thursday, Nov. 26, Friday, Nov. 27, & Saturday, Nov, 28  -  Eckles is CLOSED
Sunday, Nov. 29  -  3pm-3am

VSTC Library Thanksgiving Hours:
Thursday, Nov. 26 & Friday, Nov. 27  -  VSTCL is CLOSED*
*24-hour building access is not available during this time.

Copyright Basics for Graduate Students

The George Washington University - Fri, 11/06/2015 - 17:09
November 6, 2015

Tuesday, November 17
Gelman Library, Room 702

GW Associate General Counsel Michelle Gluck will cover the basic aspects of copyright law you need to know when writing a thesis or dissertation, including an overview of copyright as a "bundle of rights" and the criteria for determining fair use. 

Michelle Gluck  joined GW's Office of the General Counsel in March 2014 after 8 years as Special Counsel to the University System of New Hampshire. Prior to that, she served in the U.S. Department of Justice specializing in tobacco litigation and immigration appellate litigation, and as a Deputy Attorney General in the Government Section of the State of California Department of Justice. Upon graduation from law school, Ms. Gluck clerked for U.S. District Judge Lawrence T. Lydick and for the Ninth U.S. Circuit Court of Appeals. She is admitted to practice in California and before the United States Supreme Court and numerous other federal district and circuit courts. Her areas of concentration are research and compliance, technology transfer, and intellectual property.  Ms. Gluck received her A.B. with honors from the University of California, Berkeley in 1985 and her J.D. from Boalt Hall School of Law at the University of California, Berkeley, in 1988.

Okinawa: The Afterburn Film Showing & Discussion

The George Washington University - Fri, 11/06/2015 - 09:18
November 6, 2015

Wednesday, November 18
Gelman Room 214

Join us for a special screening of the English-language version of this popular documentary, followed by a discussion with director John Junkerman.

A major hit in Japanese theaters since its release in June, Okinawa: The Afterburn  is the first documentary to provide a comprehensive picture of the 1945 Battle of Okinawa and the ensuing 70-year occupation of the island by the US military. In April 1945, American forces invaded Okinawa, launching a battle that lasted 12 weeks and claimed the lives of 240,000 people. The film recounts the battle through the eyes of Japanese and American soldiers who fought on the same battlefields, along with Okinawan civilians swept up in the fighting. The film carries the story to the present, depicting the discrimination and oppression forced upon Okinawa by the American and Japanese governments. With Okinawa now embroiled in a struggle over the construction of a new base, this timely film illuminates the roots of a deep-seated resistance.

Co-sponsored by the Global Resources Center and Veterans For Peace, Ryukyu Okinawa Chapter Organizing Committee 

International Student Coffee Hour in the Global Resources Center

The George Washington University - Thu, 10/29/2015 - 19:59
October 29, 2015

Tuesday, Nov. 17
Global Resources Center, 7th floor

Please join us in the Global Resources Center (GRC) for an international student coffee hour co-hosted with the International Services Office (ISO). Take a tour of the GRC, chat with a specialist about your research and global interests, and enjoy a snack with your ISO friends! This event is part of GW's International Education Week.  

Please RSVP: go.gwu.edu/GRCCoffee 

The GRC focuses upon the political, socio-economic, historical, and cultural aspects of countries and regions around the globe from the 20th century onward with the following specialized resource centers: Russia, Eurasia, Central & Eastern Europe, China Documentation Center, Taiwan Resource Center, Japan Resource Center, Korea Resources, Middle East & North Africa.

Social Media Harvesting Techniques

The George Washington University - Wed, 10/28/2015 - 07:38
October 28, 2015Justin Littman

Social Feed Manager (SFM) is a tool developed by the Scholarly Technology Group for harvesting social media to support research and build archives. As part of enhancements to SFM being performed under a grant from the National Historical Publications and Records Commission (NHPRC), we are adding support for writing social media to Web ARChive (WARC) files. This blog entry describes two techniques for retrieving social media records from the application programming interfaces (APIs) of social media platforms and writing to WARCs. These techniques are based on Python, though these or similar approaches are applicable to other programming languages.

Background on social media APIs

Many social media platforms provide APIs to allow retrieval of social media records. Examples of such APIs include the Twitter REST API, the Flickr API, and the Tumblr API. These APIs use HTTP as the communications protocol and provide the records in a machine readable formats such as JSON. Compared to harvesting HTML from the social media platform’s website, harvesting social media from APIs offers some advantages:

  • The APIs are more stable. The creators of the APIs understand that when they change the API, they will be breaking consumers of the API. (Want notification when an API changes? Give API Changlog a try.)
  • The APIs provide social media records in formats that are intended for machine processing.
  • The APIs sometimes provide access to data that is not available from the platform’s website. For example, the following shows the record for a tweet retrieved from Twitter’s REST API:
{ "created_at": "Tue Jun 02 13:22:55 +0000 2015", "id": 605726286741434400, "id_str": "605726286741434368", "text": "At LC for @archemail today: Thinking about overlap between email archiving, web archiving, and social media archiving.", "source": "Twitter Web Client", "truncated": false, "in_reply_to_status_id": null, "in_reply_to_status_id_str": null, "in_reply_to_user_id": null, "in_reply_to_user_id_str": null, "in_reply_to_screen_name": null, "user": { "id": 481186914, "id_str": "481186914", "name": "Justin Littman", "screen_name": "justin_littman", "location": "", "description": "", "url": null, "entities": { "description": { "urls": [] } }, "protected": false, "followers_count": 45, "friends_count": 47, "listed_count": 5, "created_at": "Thu Feb 02 12:19:18 +0000 2012", "favourites_count": 34, "utc_offset": -14400, "time_zone": "Eastern Time (US & Canada)", "geo_enabled": true, "verified": false, "statuses_count": 72, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "C0DEED", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_tile": false, "profile_image_url": "http://pbs.twimg.com/profile_images/496478011533713408/GjecBUNj_normal.jpeg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/496478011533713408/GjecBUNj_normal.jpeg", "profile_link_color": "0084B4", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "has_extended_profile": false, "default_profile": true, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false }, "geo": null, "coordinates": null, "place": { "id": "01fbe706f872cb32", "url": "https://api.twitter.com/1.1/geo/id/01fbe706f872cb32.json", "place_type": "city", "name": "Washington", "full_name": "Washington, DC", "country_code": "US", "country": "United States", "contained_within": [], "bounding_box": { "type": "Polygon", "coordinates": [ [ [ -77.119401, 38.801826 ], [ -76.909396, 38.801826 ], [ -76.909396, 38.9953797 ], [ -77.119401, 38.9953797 ] ] ] }, "attributes": {} }, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 0, "entities": { "hashtags": [], "symbols": [], "user_mentions": [], "urls": [] }, "favorited": false, "retweeted": false, "lang": "en" }

and how the same tweet appears on Twitter’s website:


It is worth emphasizing that retrieving social media records from an API are just HTTP transactions, just like the HTTP transactions between a web browser and a website or a web crawler and a website.


(The one exception worth noting is Twitter’s Streaming APIs. While these APIs do use HTTP, the HTTP connection is kept open while additional data is added to the HTTP response over a long period of time. Thus, this API is unique in that the HTTP response may last for minutes, hours, or days rather than the normal milliseconds or seconds and the HTTP response may be significantly larger in size than the typical HTTP response from a social media API. This will require special handling and is outside the scope for this discussion, though ultimately requires consideration.)


To simplify interacting with social media APIs, developers have created API libraries. An API library is for a specific programming language and social media platform and makes it easier to interact with the API by handling authentication, rate limiting, HTTP communication, and other low-level details. In turn, API libraries use other libraries such as an HTTP client for HTTP communication or an OAuth library for authentication. Examples of Python API libraries include Twarc or Tweepy for Twitter, Python Flickr API Kit for Flickr, and PyTumblr for Tumblr. Rather than having to re-implement all of these low-level details, ideally a social media harvester will use existing API libraries.

  Background on WARCs

WARCs allow for recording an entire HTTP transaction between an HTTP client and an HTTP server. A typical transaction consists of the client issuing a request message and the server replying with a response message. These are recorded in the WARC as a request record and response record pair. In a WARC, each record is composed of a record header containing some named metadata fields and a record body containing the HTTP message. In turn, each HTTP message is composed of a message header and a message body. Here is an example request record for GWU’s homepage:

  WARC/1.0 WARC-Type: request Content-Type: application/http;msgtype=request WARC-Date: 2015-10-14T18:01:10Z WARC-Record-ID: WARC-Target-URI: http://www.gwu.edu/ WARC-IP-Address: WARC-Block-Digest: sha1:A7SJCNM5DLPJCLQMGJOXD7XDWWFQRDGH WARC-Payload-Digest: sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ Content-Length: 69 WARC-Warcinfo-ID: GET / HTTP/1.1 User-Agent: Wpull/1.2.1 (gzip) Host: www.gwu.edu  

and a response record:

  WARC/1.0 WARC-Type: response Content-Type: application/http;msgtype=response WARC-Date: 2015-10-14T18:01:10Z WARC-Record-ID: WARC-Target-URI: http://www.gwu.edu/ WARC-IP-Address: WARC-Concurrent-To: WARC-Block-Digest: sha1:FAGHJPTSB4TIHWBMNPAIXM6IRS7EMOHS WARC-Payload-Digest: sha1:D2OLR4C4UASIRNSGJCNQMK5XBQ6RAWGV Content-Length: 79609 WARC-Warcinfo-ID: HTTP/1.1 200 OK Server: Apache/2.2.15 (Oracle) X-Powered-By: PHP/5.3.3 Expires: Sun, 19 Nov 1978 05:00:00 GMT Last-Modified: Wed, 14 Oct 2015 03:33:00 GMT Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0 ETag: "1444793580" Content-Language: en X-Generator: Drupal 7 (http://drupal.org) Link: ; rel="image_src",; rel="canonical",; rel="shortlink" Content-Type: text/html; charset=utf-8 Transfer-Encoding: chunked Date: Wed, 14 Oct 2015 18:01:11 GMT X-Varnish: 982060864 981086065 Age: 52090 Via: 1.1 varnish Connection: keep-alive X-Cache: Hit from web1 Set-Cookie: NSC_dnt_qspe_tey_80=ffffffff83ac15c345525d5f4f58455e445a4a423660;expires=Wed, 14-Oct-2015 18:31:11 GMT;path=/;httponly b3a <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" version="XHTML+RDFa 1.0" dir="ltr" xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:sioc="http://rdfs.org/sioc/ns#" xmlns:sioct="http://rdfs.org/sioc/types#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#"> <head profile="http://www.w3.org/1999/xhtml/vocab"> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" /> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> [A whole bunch of HTML skipped here] </body> </html>

(This was recorded using Wpull: wpull http://www.gwu.edu --warc-file warc_example --no-warc-compression)

Putting together this discussion of social media APIs and WARCs, we'll describe techniques for harvesting social media records using existing API libraries and record the HTTP transactions in WARCs.

The first technique

The first technique is to attempt to record the HTTP transaction from the HTTP client used by the API library. While there are a number of higher-level clients in Python (e.g., requests), the underlying HTTP protocol client is generally httplib. Unfortunately, httplib does not provide ready access to the entire HTTP message, just the message body. However, when the debug level of httplib is set to 1, httplib writes the message header to standard output (stdout). For example:

>>> import httplib >>> conn = httplib.HTTPConnection("www.gwu.edu") >>> conn.set_debuglevel(1) >>> conn.request("GET", "/") send: 'GET / HTTP/1.1\r\nHost: www.gwu.edu\r\nAccept-Encoding: identity\r\n\r\n' >>> resp = conn.getresponse() reply: 'HTTP/1.1 200 OK\r\n' header: Server: Apache/2.2.15 (Oracle) header: X-Powered-By: PHP/5.3.3 header: Expires: Sun, 19 Nov 1978 05:00:00 GMT header: Last-Modified: Wed, 14 Oct 2015 03:33:00 GMT header: Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0 header: ETag: "1444793580" header: Content-Language: en header: X-Generator: Drupal 7 (http://drupal.org) header: Link: ; rel="image_src",; rel="canonical",; rel="shortlink" header: Content-Type: text/html; charset=utf-8 header: Transfer-Encoding: chunked header: Date: Wed, 14 Oct 2015 18:16:54 GMT header: X-Varnish: 982091814 981086065 header: Age: 53034 header: Via: 1.1 varnish header: Connection: keep-alive header: X-Cache: Hit from web1 header: Set-Cookie: NSC_dnt_qspe_tey_80=ffffffff83ac15c345525d5f4f58455e445a4a423660;expires=Wed, 14-Oct-2015 18:46:54 GMT;path=/;httponly

By capturing this debugging output, the HTTP message can be reconstructed and recorded in the appropriate WARC records. We use Internet Archive’s WARC library for writing to WARCs. Here’s a gist showing some code that uses the Python Flickr API Kit to retrieve the record for a photo from Flickr’s API and record in a WARC: https://gist.github.com/justinlittman/a46ab82f456423a71e39. (The resulting WARC is also provided in the gist.)

Advantages of this technique:

  • Complete control over writing the WARC, including WARC record headers and deduplication strategy.

Disadvantages of this technique:

  • Reconstructs the HTTP message instead of recording directly as passed over the network.
  • Fragile, since depends on debugging output of httplib. There is no guarantee that this debugging output will remain unchanged in the future.
  • Often requires hacking the API library to get access to the HTTP client.
The second technique

The second approach was suggested by Ed Summers. In this approach, an HTTP proxy records the HTTP transaction. In a proxying setup, the HTTP client makes its request to the proxy. The proxy in turn relays the request to the HTTP server. It receives the response from the server and relays it back to the client. By acting as a “man in the middle”, the proxy has access to the entire HTTP transaction.

Internet Archive’s warcprox is an HTTP proxy that writes the recorded HTTP transactions to WARCs. Among other applications, warcprox is used in Ilya Kreymer’s webrecorder.io, which records the HTTP transactions from a user browsing the web. In our case, warcprox will record the HTTP transactions between the API library and the social media platform’s server.

This gist demonstrates using the Python Flickr API Kit to retrieve the record for a photo from Flickr’s API and recording it using warcprox: https://gist.github.com/justinlittman/0b3d76ca0465a9d914ed



  • Depends on the API library supports configuring a proxy or hacking the API library to get access to the HTTP client to configure proxying.
  • Does not provide control over the WARC, especially the ability to write WARC record headers.
  • Requires running proxy as a separate process from the harvester.

STG is continuing to experiment with and refine these two approaches. Thoughts on these approaches or suggestions for other techniques would be appreciated and we welcome any discussion of social media harvesting in general.

Hebrew Printing in the Orient Opening Reception

The George Washington University - Tue, 10/27/2015 - 21:17
October 27, 2015

Sunday, November 8, 2-4pm 
Gelman Library, Room 702

Enjoy a panel discussion on a broad panorama of typography from far-flung presses, from the Maghreb to China and from Central Asia to India, South Africa and the Antipodes. “Hebrew Printing in the Orient,” an exhibit drawn from the I. Edward Kiev Collection, includes texts in Aramaic and the various languages of Oriental Jewry, including Ladino (Judeo-Spanish), Judeo-Arabic, Judeo-Persian and Marathi. Among the rarities is a copy of Hok le-Yisra'el (Cairo, 1740), one of the first books ever printed in Egypt.

The Kiev Collection at The George Washington University was established in 1996 with the donation of the large personal library of I. Edward Kiev, one of the preeminent Judaica librarians of the 20th century.  Together with books in western languages, German-Jewish graphic art, and extensive bibliographic literature in which Kiev was expert, the collection holds Hebraica printed around the world over the course of five centuries.

Help the Libraries Help You!

The George Washington University - Tue, 10/27/2015 - 21:04
October 27, 2015

Faculty and students can help the libraries improve research assistance by taking our quick, 3-question survey. Help us understand how you prefer to access help with your research so we can offer more of the services you like best. This survey will be open until December 1.