University News

Sneak Peeks: New Fenwick Library

George Mason University - Sat, 12/19/2015 - 07:40

Glimpse some of the new spaces in Fenwick! Learn about the move and more at

Study area, first floor

Study area, first floor.

Categories: University News

Congratulations Winter Grads!

George Mason University - Fri, 12/18/2015 - 07:21

Congratulations & Best Wishes from the University Libraries.

Categories: University News

Chief Impact Officer Discusses Innovation and Education at SIS

American University News - Fri, 12/18/2015 - 00:00
James Shelton, the Chief Impact Officer of 2U, Inc., visited the School of International Service (SIS) in December to discuss innovation and education with Dean James Goldgeier. The event was held in conjunction with the Kogod School of Business.
Categories: University News

SPA Grad Delivers AU Winter Commencement Address

American University News - Fri, 12/18/2015 - 00:00
Quentin Lamar Fulks, SPA/MA ’15, one of two student speakers at the ceremony, urged fellow graduates to pursue ambitions for “people other than ourselves.”
Categories: University News

Libraries Going Cashless

George Mason University - Thu, 12/17/2015 - 07:08

Mason Libraries are going cashless on January 19, 2016, and will accept Mason Money only for

  • Fines
  • Fees
  • Printing
  • Copying
  • Scanning

Mason Money is operated by the Mason Card Office. Mason Money Stations are located in each library. You can add on your Mason ID, or purchase a visitor’s card at a Mason Money Station. Visa, MasterCard, and cash denominations of $1, $5, $10, & $20 are accepted at Mason Money Stations only, and not library service desks. Questions? Contact the Mason Card Office. 

Categories: University News

Winter Break: Library Services

George Mason University - Wed, 12/16/2015 - 11:06

University Holiday Break, December 19, 2015 – January 3, 2016

  • All Mason Libraries are closed

Winter Intersession, January 4 – 15, 2016

  • Gateway Library (Johnson Center), Arlington Campus Library and Mercer Library (Science & Technology Campus) will be open Monday through Friday, January 4 – 8 and January 11 – 15, 2016.
  • Fenwick Library will be CLOSED January 4 – 18 and is expected to open on January 19, 2016.
  • All Mason Libraries are closed on Monday, January 18, 2016.

Library Services Available: January 4-8 and January 11-15, 2016

Check Out / Pick Up / Return / Reference

  • Gateway Library: Monday – Friday, 8 a.m. – 6 p.m.
  • Arlington Campus Library: Monday – Friday, 9 a.m. – 6 p.m.
  • Mercer Library: Monday – Friday, 8 a.m. – 6 p.m.
  • Virtual Reference: Monday – Friday, 10 a.m. – 4 p.m.

 Online Resources & Services

Categories: University News

Gelman Library to House Winston Churchill’s World War II Engagement Diary

The George Washington University - Wed, 12/16/2015 - 10:06
December 16, 2015Construction of the National Churchill Library and Center to Begin this Month

A collection of handwritten cards detailing Winston Churchill’s appointments during World War II, including such historic events as Victory in Europe (VE) Day and the British prime minister’s regular meetings with the King of England and President Franklin Roosevelt, will have a new home at the George Washington University. The “engagement diary” will be featured in the new National Churchill Library and Center to be located at GW.

Steve Forbes, chairman of Forbes Media and a Churchill enthusiast, donated the collection of 30 cards to the Chicago-based Churchill Centre. The collection was then given to GW’s Estelle and Melvin Gelman Library for use in the National Churchill Library and Center, which begins construction in December.

“The engagement diary is an important historical resource, and I am pleased that they will now be seen by a broad audience,” said Mr. Forbes. “I join Churchillians everywhere in applauding The Churchill Centre’s initiative to partner with GW to create a permanent home for Churchill scholarship, studies and education in the heart of our nation’s capital.”

Privately held since the end of World War II, the cards are a source for the history of Mr. Churchill’s wartime leadership, recording the extraordinary extent of his activities and the frequency and range of his wartime journeys. Between September 1939 and June 1945, Mr. Churchill’s private secretaries kept the handwritten “engagement diary” on two-sided cards measuring 12 by 13 inches. The library has created high-resolution digital images of the cards and will launch a crowdsourcing project, open to the public, to provide full text transcription and annotation for the cards, all of which will be available to the public on a dedicated website. 

“We are delighted to receive this fantastic record that gives us a window into part of Winston Churchill’s life during World War II,” said Geneva Henry, university librarian and vice provost for libraries. “The gift coincides with the construction of the National Churchill Library and Center, the first permanent U.S. home in our nation’s capital for the study of Winston Churchill.” 

The National Churchill Library and Center, which is expected to open in 2016, will educate new generations about Mr. Churchill and will serve as a classroom and meeting space for public programs and lectures highlighting the historical significance of Mr. Churchill, his contemporaries and more recent world leaders. 

“We are honored that Steve Forbes has entrusted us with these historic documents, and we are glad that they will be a part of the National Churchill Library and Center at GW,” said Lee Pollock, executive director of the Churchill Centre. “For the first time, the original record of Churchill’s wartime activities will be made freely and widely available to scholars and students around the world.”

The library will work with academic programs across the university to develop programming.  


About the National Churchill Library and Center

The National Churchill Library and Center is part of a philanthropic partnership with the George Washington University and the Chicago-based Churchill Centre. Housed on the first floor of the Estelle and Melvin Gelman Library, this will be the first major research facility in the nation’s capital dedicated to the study of Winston Churchill.

MEDIA CONTACTS:br /> Kurie Fitzgerald:, 202-994-6461
Emily Grebenstein:, 202-994-3087

Delivering Meaningful Results to Community Clients

American University News - Wed, 12/16/2015 - 00:00
Public Relations Portfolio gives students the opportunity to work with real clients and develop creative and strategic communications based on each client’s need.
Categories: University News

MFA Student Honored at ASC Awards in Los Angeles

American University News - Wed, 12/16/2015 - 00:00
Steven Holloway honored at American Society of Cinematographers’ Gordon Willis Student Heritage Awards.
Categories: University News

Stories of Strength: First-Person Films by Community Activists

American University News - Wed, 12/16/2015 - 00:00
Stories of Strength is a series of first-person short digital films featuring activists who lived in Washington, D.C. during the 1960’s and 70’s.
Categories: University News

AU Representatives Invited to World AIDS Day at the White House

American University News - Wed, 12/16/2015 - 00:00
Event highlights national HIV prevention and care outcomes.
Categories: University News

Daughter Inspires Undergrad to Earn Degree

American University News - Wed, 12/16/2015 - 00:00
Betsy Romero graduates this fall as a role model for many.
Categories: University News

AU Number 1 in Presidential Management Fellowship Semi-Finalists

American University News - Wed, 12/16/2015 - 00:00
59 students, the most of any university in the nation, have been selected as semi-finalists for the prestigious program.
Categories: University News

Harvesting the Twitter Streaming API to WARC files

The George Washington University - Tue, 12/15/2015 - 08:54
December 15, 2015

The Twitter Streaming API is very powerful, allowing harvesting tweets not readily available from the other APIs. However, recall from our previous post that the Twitter Streaming API does not behave like REST APIs that are typical of social media platforms -- see Twitter’s description of the differences. A single HTTP response is potentially huge and may be collected over the course of hours, days, or weeks. This is a poor fit for both the normal web harvesting model in which a single HTTP response is recorded as a single WARC response record in a single WARC file, and for most web archiving tools, which store HTTP responses in-memory and don’t write them to the WARC file until the response is completed.

This post describes an approach we’ve developed for harvesting the Twitter Streaming API and recording in WARC files. We will also show how the tweets can be extracted from the WARC files for use by a researcher.

The Twitter Streaming API is not the only form of streaming content on the Web and the authors of WARC Specification had the forethought to support record segmentation. In record segmentation, a single HTTP response is split into multiple WARC records, potentially in multiple WARC files. The first record is a WARC response record; subsequent records are WARC continuation records. The header of the final continuation record also contains the total number of bytes of the entire HTTP response.

While WARC record segmentation is theoretically a good solution for the Twitter Streaming API, record segmentation is not widely supported in most web archiving tools. Our first step was to modify Internet Archive’s warcprox to support record segmentation. (Our pull request is #15. The crux of the change is between lines 210 and 245 in Recall from the earlier post that warcprox is an HTTP proxy that records the HTTP transaction in a WARC.

The following shows snippets from a WARC file created by the modified warcprox from the Twitter filter API retrieved by twarc tracking “obama”. It consists of a WARC response record, a request record, a continuation record, and a final continuation record.

WARC/1.0 WARC-Type: response WARC-Record-ID: <urn:uuid:9aff4bf7-d64a-411c-9ef8-cd82778e036e> WARC-Date: 2015-12-02T16:59:07Z WARC-Target-URI: WARC-IP-Address: Content-Type: application/http;msgtype=response WARC-Segment-Number: 1 Content-Length: 1149 WARC-Block-Digest: sha1:7c8de1bd439cf62c67f9f4b0c48e6f3ae39eb4ef WARC-Payload-Digest: sha1:cc1b7bf9a2945ddf8ae7c35d5f05513d0d8b691b HTTP/1.1 200 OK connection: close content-Encoding: gzip content-type: application/json date: Wed, 02 Dec 2015 16:59:07 GMT server: tsa transfer-encoding: chunked x-connection-hash: 8439cf557d0f807635797377d9e7d0b6 a ? 1f1 tSۊ?0??A/}?%??ر??^???¶??P?q#"KF??n??w?ٔ%?O3?͜?y`?GQ    Y?~?????!+?U?? ^r? ?ي?bZ???r^WeU?_?:[?ѓ??$?"?I?7????1`?ہ?;?oH?}?a?v?.?ε                                                         }???F???t??|???N??????m?i?t??9? ??1???B?c?A?<?;a?/???&?d?dkziR?Vxͽ????q                                                ??8?څ??;?Z "?c'c?$g????? ????     4???ʁ|???5?Y-k???z???9FM?<v{?v픗2K>_?2!??d????q?v???E?{|??ct???=???=n??_E IQ?'? U?&??]???n?ֽ??"?(:*?6,???F??????4:?%?? ?=-??x?-ל????EQ????N>?????VOW???c'\???^gk?Z=???lZ???y?? 163 ?U?n?0???C?^??Æ^ =?T?)?4X_U????7~T?75??~Q?˵Ғ1??????`"????c?wfgR?`?g???kp<???r)+. ?4zD?????ie6?/F????˭*???   Xm??rLhEiƈs???B)y???b;a??Am??d׮?<??ԍNȄ?$????T?r?ϝ,ot?m???L???                         ?j4??.??Q??b???%????7?????????7??XT?2B%?,aQ?4I?p?ž?wn?z                                                                                 ??\??7`                                                                                        R{Z???8?Ϲ<?$?t??)u?^?5?u?{}?K??yOo?]?(??.f??|??m???? 229 [o?0???'q???6??-J?.?z@k'??IL@?? WARC/1.0 WARC-Type: request WARC-Record-ID: <urn:uuid:3a6ce873-13a9-401a-bfd9-3ddc321aab96> WARC-Date: 2015-12-02T16:59:07Z WARC-Target-URI: WARC-Concurrent-To: <urn:uuid:9aff4bf7-d64a-411c-9ef8-cd82778e036e> WARC-Block-Digest: sha1:fa301cb54fd6c38adac4a43bacf36d38198ec8e0 Content-Type: application/http;msgtype=request Content-Length: 566 POST /1.1/statuses/filter.json HTTP/1.1 content-length: 30 accept-encoding: deflate, gzip host: accept: */* user-agent: python-requests/2.8.1 content-type: application/x-www-form-urlencoded authorization: OAuth oauth_nonce="149931870481283598461449075546", oauth_timestamp="1449075546", oauth_version="1.0", oauth_signature_method="HMAC-SHA1", oauth_consumer_key="EHdoTe7ksBgflP5nUalEfhaeo", oauth_token="481186914-c2yZjbk1np0Z5MWEFYYQKSQNFBXd8T9r4k90YkJl", oauth_signature="m0hHjrPnU7aTtOhjmk8om3Vv7Ok%3D" track=obama&stall_warning=True WARC/1.0 WARC-Type: continuation WARC-Record-ID: <urn:uuid:c18791da-24e0-42a7-91df-82dfdae6697e> WARC-Date: 2015-12-02T16:59:07Z WARC-Target-URI: WARC-IP-Address: Content-Type: application/http;msgtype=response WARC-Segment-Number: 2 WARC-Segment-Origin-ID: <urn:uuid:9aff4bf7-d64a-411c-9ef8-cd82778e036e> Content-Length: 1220 WARC-Block-Digest: sha1:82794503724ba3bb06fee69302614a3f5ef00c39 ?????a??N?*M???_l???y"uU]IZ`RU1?/?n?V?`???&H??h?U??x??Ea j???mٌSjfsr¨??ê˽RN?&F'?<?h^H~ ?è?ـ                                                                                             ??m?@?'?]???:?sT?‡T?/S??W??t??]M???_??.???o?ҷa??Sn1???/?;Z;?+?PF??                                        $L?HnD?????x?t?|ľ?    ?    -G^?|?    "?????gr?? ? )?e[????{]vW???j???-??*T&?{)2\?9^?`\?_??>?.-????ҚO??{v?+?W??4??ps %c?8?'?`?nU???a??%?q?/q?о?X???&???G}71G?&V?                                                                                   ?w?ȱZn?ӯ?&?*C??&s?R???rRa???? ?j??es??q?@?s??\/7?w??v?????+???2(????????mNS? ?iZ?????p}?8?.?????????;?? 16c ̘AO?0ǿ      g?F˸??&?!?u???2D????&U?Ń'J?ڒ??????????K5??pBm?T??=)?0?                                                            8Ę?????Ԉ,?                                                                      O??>u?~???3?A???Ώho??[?rYV'??jW??J?e?IV?r?d?*L6    ;???????i/ R-       ??   ??Y?Cĭ??           ??2]vj ??7??C5B??????!?;????m(j???^?d/??jK??m?d?K ,???|P˂?ۥF2??5*%`Lﲞ?x\g????'qs?F?                                                                                                ?O?                                                                                                   ?=Ԥz`??k+?l?gS????                                                                                                                     qU?g#?S????3??SӕS???`2=HM?-? ??Ys?5S?O??? 68 ??    U??X?<???̀4?B???Q'Ԇ7(?!?S?፮?>F??^??????Rm,?A????r?<(e??:?28;?f???? 1a1 ??OO?@??&~    ?"?"??D?5?Lj6P?,?@K??    [ ?F?`????~? ???<?T5? ???%'ap,$?FCZ????vP???D‚?N?8p?-/???l[??y???#?{]??(?J????'E?&΃???զj???X??7?<Ɩg?ՅŸU?Bh%                                                                                                            m??u?h????????s?N??u????u??0֜d WARC/1.0 WARC-Type: continuation WARC-Record-ID: <urn:uuid:d7bfe010-7831-45a8-8361-715692ea014b> WARC-Date: 2015-12-02T16:59:09Z WARC-Target-URI: WARC-IP-Address: Content-Type: application/http;msgtype=response WARC-Segment-Number: 3 WARC-Segment-Origin-ID: <urn:uuid:9aff4bf7-d64a-411c-9ef8-cd82778e036e> WARC-Segment-Total-Length: 924 WARC-Truncated: unspecified Content-Length: 307 WARC-Block-Digest: sha1:57b73cdaab8025cc04a83f3ae6eff2dd6e2bfa15 ?^,~0??Cc?43??n????8???????A^]d???ן&??qSN?FZ ??m?$p? ?&?A?p$?$?S??d,^zk?#?Y    ?q?g~????R????P?\???~??w??T?&`                                                               ????L?r????i????Th2?2B??$?C??:????T????? 20e tRMk?@?+??C]YV??T NqZHS?K/??F???Y?QE?|GVjB?u?a?y??͋(,J??Vz???X? ??̲i??)|???$?L?H?Rd?y???"

As should be obvious, this data is not readily usable by most researchers. In particular, there are four barriers to use:

In order to be confident in this approach, we feel it is prudent to make sure that we can access the tweets given these various barriers and the lack of support for record segmentation in web archiving tools. To this end, we developed TwitterStreamWarcIter and the parent class BaseWarcIter.  TwitterStreamWarcIter outputs the tweets from a WARC file, one per line. This is the same output as twarc or cat-ing a line-oriented json file and can be piped to other tools such as jq:

$ python test_1-20151202200525007-00000-30033-GLSS-F0G5RP-8000.warc.gz {"contributors": null, "truncated": false, "text": "RT @Litorodbujan: Obama quiere visitar Espa\u00f1a!\nAhora s\u00ed somo s un pa\u00eds serio; con Rajoy no se repetir\u00e1 esto.   #RajoyconPiqueras https://t.c\u2026", "is_quote_status": false,  "in_reply_to_status_id": null, "id": 672144412936445952, "favorite_count": 0, "source": "<a href=\"https://mobile.twitter. com\" rel=\"nofollow\">Mobile Web (M2)</a>", "retweeted": false, "coordinates": null, "timestamp_ms": "1449086690540", "ent ities": {"user_mentions": [{"id": 320317854, "indices": [3, 16], "id_str": "320317854", "screen_name": "Litorodbujan", "nam ....

or suitable for human-consumption with the --pretty flag:

$ python test_1-20151202200525007-00000-30033-GLSS-F0G5RP-8000.warc.gz --pretty {     "contributors": null,      "truncated": false,      "text": "RT @Litorodbujan: Obama quiere visitar Espa\u00f1a!\nAhora s\u00ed somos un pa\u00eds serio; con Rajoy no se repetir\u00e1 esto.   #RajoyconPiqueras https://t.c\u2026",      "is_quote_status": false,      "in_reply_to_status_id": null,      "id": 672144412936445952,      "favorite_count": 0,      "source": "<a href=\"\" rel=\"nofollow\">Mobile Web (M2)</a>",      "retweeted": false,      "coordinates": null,      "timestamp_ms": "1449086690540",      "entities": { ....

This approach addresses the WARC barrier by using Internet Archive’s WARC library to read the WARC file. The IA WARC library is extended to handle record segmentation by stitching the payload back together. (See CompositeFilePart. It still doesn’t handle continuations that are in other WARC files, but solving that problem is just software development.) And lastly, the content encoding and transfer encoding barriers are remedied by loading the payload into a urllib3 HTTPResponse which handles the decoding of the content encoding and transfer encoding, as well as providing a familiar, pythonic interface to the response.

As we have explored the similarity between web harvesting and social media harvesting, the Twitter Streaming API represents the point of greatest friction. However, the above represents a reasonable first approach to addressing the unique features of the Twitter Streaming API.

Our Hands

American University News - Tue, 12/15/2015 - 00:00
Professor Caleen Jennings explores race in America.
Categories: University News

The Sculpture and the Student

American University News - Tue, 12/15/2015 - 00:00
Discovering themes in Michelangelo's David.
Categories: University News

The Alper Initiative

American University News - Tue, 12/15/2015 - 00:00
A new home for Washington art and artists at the AU Museum.
Categories: University News

The Katzen Arts Center: A Legacy

American University News - Tue, 12/15/2015 - 00:00
World-class space made possible by Cyrus and Myrtle Katzen.
Categories: University News

A Sense of Place

American University News - Tue, 12/15/2015 - 00:00
Emmy Award winning composer John Wineglass.
Categories: University News

Sounding Off

American University News - Tue, 12/15/2015 - 00:00
The art of William Brent, trailblazer in experimental music performance.
Categories: University News