In 1990, the Library of Congress launched “American Memory,” its first digital pilot project. The LOC selected a handful of the 160 million objects in its collection to digitize, store on laserdiscs and CDs, and distribute to 44 schools and libraries across the country. Like so many pre-internet digital endeavors, American Memory was unsuccessful. According to the project’s retrospective website, the problem was tactile: “distributing these materials in CD-ROM format was both inefficient and prohibitively expensive.”
Within a few years, CDs were relics and the internet was proving an accessible space for digital innovation. In October 1994, the LOC announced the creation of the National Digital Library Program, or NDLP, a “systematic effort to digitize some of the foremost historical treasures in the Library . . . and make them readily available on the Web to Congress, scholars, educators, students, the general public, and the global Internet community.” By 1999, the NDLP had an annual budget of $12 million and over 100 employees. It had successfully digitized over 5 million objects—an unrecognized feat of the early internet.
Yet like American Memory, the NDLP proved to be a novelty, and it too tailed off. A 2001 digital strategy report, commissioned by the LOC and conducted by the National Research Council, concluded that the LOC’s digitization efforts were insufficient: only a tiny fraction of the Library’s collection had been uploaded. The report recommended an ambitious reorganization, one that would “systematically address the policies, procedures, and infrastructure required for [the LOC] to collect diverse types of digital resources and to integrate them into its systems for description and cataloging, access, and preservation.” The LOC needed to commit itself to the internet.
Fifteen years later, the report’s recommendations remain recommendations. In March 2015, an investigation by the Government Accountability Office painted a bleak picture of the LOC’s technological infrastructure, finding “significant weaknesses” in strategic planning (there was no IT strategic plan); investment management (the LOC does not review all of its key investments, has no system for tracking spending, and misplaced 10,000 computers); and leadership (the Library did not have a Chief Information Officer from 2012 to late 2015).
The report confirmed that the LOC had lost touch with whatever mainstream American audiences it had reached in its initial digital programs. In the 2000s, Google Books and the European Union’s Europeana project demonstrated the potential of collaborative mass digitization and online accessibility, but the LOC had fallen behind. “The Library of Congress has been asleep at the switch,” John Palfrey, a library activist and head of school at Phillips Academy, told me. “It’s a national embarrassment.”
This digital failure occurred during the tenure of the 87-year-old Dr. James H. Billington, a Russia scholar who was appointed the Librarian of Congress by Ronald Reagan in 1987. By September 2015, when Billington retired “under fire,” as the New York Times headline had it, whatever early progress the LOC made on the internet seemed squandered. Responsibility for restoring the LOC’s digital mandate has fallen to his newly confirmed successor, Carla Hayden, the first Librarian of Congress of the 21st century. The job Hayden has before her is to prepare the Library to serve another century of US citizens where we’ve become most accustomed to consuming information: the internet.
The Library of Congress was established on April 24, 1800, when John Adams signed the act of Congress that moved the nation’s capital to Washington, DC. “The Library of Congress was founded mainly to be a legislative library,” says Carl Ostrowski, a professor at Middle Tennessee State University and the author of Books, Maps, and Politics, a history of the early Library. Unlike the temporary capitals of New York and Philadelphia, “there wasn’t another institution in Washington that could provide the books Congress needed to write legislation.” But the Library quickly outgrew its Congressional mandate, as well as the single room it occupied—a scholarly den filled with balconies of shelves and wide tables. It became a social space for the DC elite, according to Ostrowski, and stocked the latest novels for entertainment.
Ainsworth Rand Spofford, the Librarian of Congress from 1864 to 1897, oversaw the construction of the LOC’s enormous Thomas Jefferson Building, the only structure an average visitor is likely to experience, and expanded the institution’s reach. “There is almost no work, within the vast range of literature and science,” he wrote in an 1874 report, “which may not at some time prove useful to the legislature of a great nation.” Thus the LOC’s mandate expanded: it would acquire anything and everything of importance, whether through the copyright office that it integrated under Spofford or its network of international experts recommending foreign-language publications. By the late 19th century, the Library of Congress had become a kind of national brain trust, a heritage of information that aspired to timelessness.
The Thomas Jefferson Building today is ceremonial, a touristic beaux arts monument with soaring domed ceilings and grand reading rooms secreted in corners accessible only to a thinning population of registered scholars. The LOC’s more mundane infrastructure is housed across Independence Avenue in the blandly brutalist James Madison Memorial Building, which is linked to the Jefferson and the John Adams Building—an Art Deco tower opened in 1939 for the business and science holdings—by underground tunnels that resemble the hallways of a sinister elementary school. It’s in these secondary spaces where the real labors of a library, like digitization, happen.
The LOC’s digital collection currently comprises over 7 petabytes (7 million gigs) with more than 15 million items, including 150,000 print books. In an ideal future scenario, the LOC estimates that it could digitize a further 3-5 PB a year, but even at that optimistic rate, full digitization of the 160 million-object collection could take decades, and that’s if it stayed the size it is today. (The collection adds about 10,000 objects daily.)
Plenty of factors stand in the way of digitization. Budgets are shrinking and awareness of the institution’s purpose is low—a recent Facebook rumor had Donald Trump defunding the LOC to save money. The LOC takes scholarly care in digitization, assuring that the replicas it creates will be authoritative and stable, but the process is slow and inefficient. Every object from the collection that gets digitized must first be removed from the LOC stacks or its storage warehouses offsite in Maryland, evaluated for its ability to endure physical scanning, and then hand-fed through a scanner. The resulting data is processed and uploaded to the internet with proper tagging and citations, following standards that the LOC itself developed. A single print could take as long as a day to scan and upload.
The LOC spends between $6 million and $8 million on digitization annually, maintaining a dedicated staff of 13, plus external contract scanners. With such a modest budget (lower than in 1999), the Library has to make difficult decisions as to which parts of the collection should go online. Beth Dulabahn, the LOC’s director of integration management, told me that objects are more likely to be digitized if they’re of particular importance to Congress or might be popular with the public. “When we digitize we try to go for the biggest bang for the buck,” Dulabahn said. “The mission is still to acquire materials and then as legally possible to make those available to as many people as we can. Certainly the internet has provided a fabulous opportunity to make materials available to much wider audiences.” Recent projects include a scanned series of photographs from Depression-era Wyoming as well as curated Pinterest boards with themes ranging from African American History Month to Walt Whitman.
The LOC’s more obscure digitization efforts often aren’t promoted to public audiences, contributing to their invisibility. In a windowless basement workshop, Katharine Danzis, a private paper conservator working for the LOC, prepared a cardboard box of the archivist Alan Lomax’s personal archive. “We’re looking for items that will not get a good scan or that will be damaged,” she said, holding up a clipping of heavily creased old notes and letters. Tears are labeled, folds unfolded, and fragile fragments placed in transparent polyester. The Lomax collection is being scanned at a rate of around eight boxes a week. There are, Danzis said, “hundreds and hundreds” of boxes.
Elsewhere, the prints and photographs division scans around 50,000 items a year; their collection numbers around 14.5 million. “Just about 300 years at that pace,” Katherine Arrington, a digital library specialist in the division, said with a laugh. While rugged books can be fed through faster scanners, intensive image scanning happens on the tank-like Metis, a hulking $150,000 Italian scanner stationed in a corner Arrington’s division. The LOC’s model was the first of its kind in the United States; it now owns two. The final results are files ranging from 200 megabytes to two gigs that are so complete as to obviate the need for the object itself. “Even if the technology does improve, we’ve scanned it to the point where we’ve gotten what people need out of it,” said Paul Hogroian, a senior lab technician, laying down a 19th century lithograph and leaving the Metis to its busily sentient buzz. “You feel that you’ve done something now that will last.”
Putting the scanned files online is the job of Jim Karamanis, the LOC’s head of web services. Files are stored first in a long-term archive, a kind of digital deep freeze. Then, the web team creates accessible duplicates of that archive for the public. Karamanis has been frustrated with LOC leadership’s approach to technology. “You had curatorial people managing technical people. You’re not qualified to do that,” he told me. A web redesign is underway, though progress is slow. “When you have a body of content that’s been available on the web since the ’90s, it’s been pretty radical to update it.”
Print still dominates at the Library of Congress. The John Adams building is made of stacks; in the center of every floor are dark expanses of shelving where books spill out onto the floor in dusty rows. (The shelves act as structural support for the building, otherwise the weight of the books would pull it down.) In dark corners are the rusted hulks of a pneumatic tube system once used to shoot requests for books from one floor to another. Along the ceilings of the underground hallways run conveyor belts that look like air conditioning ducts, used to ferry books between buildings. Or at least they were. “They haven’t worked for years,” said senior public affairs specialist Jennifer Gavin.
A core function of the LOC is to decide which materials merit permanence—in digital or print form—and which don’t. Spare copies of books the LOC already owns and volumes it has decided aren’t worth preserving beyond the reach of time end up in limbo, otherwise known as the Surplus Books department. Every year, 18,000 of the abandoned books (less than two days’ worth of collection additions) are distributed to US public libraries, offices of politicians, and foreign national libraries free of charge. Moldova gets a box per week, Japan specifies intricately protective cardboard boxes for its shipments, and one unnamed South American country never picks up its deliveries on time.
At the end of the LOC’s fraying bibliographic chain, Surplus Books inspires a certain feeling of futility. Digitization will never be completed, at least not before print ceases to be produced, and still there remains the task of making sure the printed books in the collection remain viable.
This is to say that the Library of Congress needs not just a shift in branding but a different, parallel mission to make digitization comprehensive and sustainable rather than a sideshow. Developing this mission is what will maintain the institution’s relevance over the next century. But the change can only come from leadership; currently, staff are as confused about the Library’s purpose as the public. “We’re trying to figure out what our place is in a society that has Google,” Elmer Eusman, chief of the LOC’s Conservation Division, told me with audible sadness. Billington didn’t mitigate the confusion.
By the time he arrived at the Library of Congress in 1987, James H. Billington was an elite academic, and a conservative government-approved cold warrior. He attended Princeton and Oxford, taught history at Harvard and Princeton, and directed the Woodrow Wilson International Center for Scholars, where he launched the Kennan Institute (a Sovietology workshop) and the bipartisan Wilson Quarterly.
Billington was a luddite—he was known to not read email, instead using a home fax machine—and a control freak. A former employee described him as a “megalomaniac” to the Washington Post. To American librarians, he represented the government’s continuing ignorance of what libraries actually do in the internet age. “You want the person in that seat to have worked the reference desk,” says Stephanie Anderson, assistant director for public services at Darien Library in Connecticut. “It’s easy to gloss over the day-to-day experience of being in the library.”
Still, Billington was a charismatic fundraiser, founding multiple new centers within the LOC. In 1994 he launched THOMAS, a digital access point for legislative information, and in 2009 the World Digital Library, a 14,000-object cross-institutional collection from the likes of the British Library and the University Library of Naples. The LOC also began working with the Internet Archive, a non-profit founded by entrepreneur and web utopian Brewster Kahle, a supporter of Billington, for some of its digitization. The LOC had a PR coup in 2010 when it announced it would begin archiving tweets from Twitter for posterity, but when I asked Beth Dulabahn about its status, she rolled her eyes. The tweet collection remains inaccessible because the Library hasn’t found a way to index it for search. In the meantime, Twitter came up with its own solutions, available to its daily users—but not the LOC.
Billington’s overall approach to technology was haphazard and, crucially, isolated from the larger changes that were taking place in Silicon Valley during his tenure. In the late 1990s and 2000s, tech companies took over the social responsibility for large-scale information management in the public imagination that Spofford had assigned to the LOC. By, say, 2002, one could ask why we needed the LOC when we had Google. This was and is a problem of perception, not just execution. The LOC plays an important role in digitization, creating universal standards and providing unique content from its collections, but Billington distanced his institution from more dynamic projects that now define our expectations of digital libraries with an intentional aloofness.
The oldest digital library, Project Gutenberg, was launched in 1971 to provide digital copies of books in the public domain. Today it holds just under 50,000 volumes. In 1996, Kahle founded his Internet Archive, a collection of 2.5 million books (much larger than the LOC’s digital connection) that now runs scanning centers in 30 libraries in eight countries, digitizing 1,000 books daily at a cost of $.10 a page, according to the founder. Other digital library efforts, like Europeana and HathiTrust, were launched over the first decade of the 2000s and tended to be collective efforts, with responsibility shared across academic and cultural institutions.
In 2004, Google launched Google Books, the largest digitization project ever, which partnered with libraries and other institutions like Harvard and the New York Public Library to scan and upload over 7 million volumes by 2008. The LOC did not participate in Google Books, in part because the process was not designed to handle delicate materials, Eusman said. The project became the target of lawsuits alleging it violated copyright by scanning and publishing in-copyright books, cases that fell under the copyright jurisdiction of the Library of Congress. If Billington had supported the project or made the LOC a meaningful partner, the legal issues could have been mitigated and LOC digitization greatly expanded.
Yet in 2015, when the Google Books collection had reached 25 million books, the disputes were resolved in Google’s favor under fair use. The company claims it will digitize all 130 million unique volumes it estimates to exist in the world, without the involvement of the United States’ largest library.
Billington also refused to send the LOC data to the Digital Public Library of America, an initiative that emerged out of Harvard’s Berkman Center for Internet & Society in April 2013. DPLA is something like the Google of libraries. Similar to the Internet Archive, DPLA partners with “content hubs” and “service hubs” to scan and process collections, collating the data into a universal online database. DPLA has over 10 million items hosted from 1,600 collections united by apps, a custom API, and a Web 2.0 interface, and it’s as easy to find a digitized object from the Smithsonian as a small Midwestern library. None of this is beyond the LOC’s reach, but all of it depends on sustained and imaginative leadership—and an openness to new projects.
Despite initial shows of enthusiasm, Billington pointedly ignored personal invitations to participate in DPLA “It’s been enormously frustrating,” said John Palfrey, also DPLA’s founding chairman, “to have the potentially the most important contributor to DPLA be absent. To have a rival system that one party is building compared to one that absolutely everybody else is participating in, it’s bonkers.” (Sources have suggested that one cause of Billington’s hesitation was historian Robert Darnton. Billington taught at Princeton with Darnton, who was Harvard’s university librarian until 2015 and a leader in the DPLA initiative, and the two men developed a longstanding professional rivalry.)
In the context of mass-digitization, the LOC’s choices have been divisive and exclusionary. “I really believe in this idea that a library can be a platform, can provide a foundation that others can build upon rather than defining those structures,” said Dan Cohen, the DPLA’s executive director. “I don’t think [the LOC] has the mandate or wherewithal to work truly across the country with thousands of institutions to help them bring their material online. We can be a little bit more flexible.” Maybe so, but when it comes to a future LOC partnership with DPLA, Palfrey isn’t optimistic. “At this point, it might be too little, too late,” he told me.
Billington’s exit in September 2015 kicked off a long-awaited wave of change at the LOC. Last October, the Senate passed the “Librarian of Congress Succession Modernization Act,” limiting the length of a Librarian’s appointment to 10 years, in what appears to be a direct reaction to Billington (though the appointments can be renewed). That month, David Mao, previously Deputy Librarian and director of the Library of Congress Law Library, became Acting Librarian. “Our technological infrastructure here needs to be modernized,” he noted in a subsequent interview. Mao will remain until the newest Librarian officially takes office.
On February 24 of this year, President Obama announced his nomination of Carla Hayden, the longtime head of the Baltimore public library system, a 63-year-old African-American woman. (Steve Jobs biographer Walter Isaacson is said to have turned the position down; Kahle and Palfrey were among other popular suggestions.) The nomination suggested an awareness that the LOC must become more progressive and open, catering to a wider demographic as the social purpose of libraries evolves from providing books to maintaining public technological support systems. “I see [the LOC] growing its stature as a leader, not only in librarianship but in how people view libraries in general,” Hayden said in a Senate appearance.
Hayden was confirmed on July 13, just before 2016 Senate recess, by a vote of 74-18. Beginning as Librarian, she will have to contend with the LOC’s burgeoning technological gridlock under Billington and the ongoing task of making the institution relevant on the internet. Billington often described the LOC as the American “library of last resort”: whatever could not be found elsewhere would always be in its archives. But the phrase has taken on another meaning: it’s easy to think that what Billington meant is that the LOC is the last place anyone would ever think to look. This is precisely what it risks becoming if it doesn’t take on a leading public role in the digitization of culture.
The LOC’s strategic plan for 2016 through 2020 features inspirational quotes from John Adams and Thomas Jefferson, as well as images of the collection’s children’s books presented on Pinterest. The document’s millennial vision for the institution is to be “a chief steward of America’s and the world’s record of knowledge, and is a springboard to the future, while providing indispensable services to Congress.” Its continuing priority is first to serve politicians, then provide an ambiguous repository of American identity to citizens. Just like the 2001 report, the plan proposes the internet for this purpose. One goal is to “acquire, describe, make accessible, secure, and preserve a universal collection of knowledge in physical and electronic formats, and obtain electronic access, for its own users, to digital materials held by other entities.” Perhaps under the new Librarian it will stick.