Torching the Modern-Day Library of Alexandria

A fascinating pair of articles came out earlier this month on the ambition and lacklustre reality of Google Books. I’ve read them both and I’ve very little to add other than my recommendation that you read them both.

James Somers, for the Atlantic, focuses his story mainly on the lawsuit and failed settlement between the copyright holders, authors, librarians, and Google:

It was the first project that Google ever called a “moonshot.” Before the self-driving car and Project Loon—their effort to deliver Internet to Africa via high-altitude balloons—it was the idea of digitizing books that struck the outside world as a wide-eyed dream. Even some Googlers themselves thought of the project as a boondoggle. “There were certainly lots of folks at Google that while we were doing Google Book Search were like, Why are we spending all this money on this project?,” Clancy said to me. “Once Google started being a little more conscious about how it was spending money, it was like, wait, you have $40 million a year, $50 million a year on the cost of scanning? It’s gonna cost us $300 to $400 million before we’re done? What are you thinking? But Larry and Sergey were big supporters.”

In August 2010, Google put out a blog post announcing that there were 129,864,880 books in the world. The company said they were going to scan them all.

Of course, it didn’t quite turn out that way. This particular moonshot fell about a hundred-million books short of the moon. What happened was complicated but how it started was simple: Google did that thing where you ask for forgiveness rather than permission, and forgiveness was not forthcoming. Upon hearing that Google was taking millions of books out of libraries, scanning them, and returning them as if nothing had happened, authors and publishers filed suit against the company, alleging, as the authors put it simply in their initial complaint, “massive copyright infringement.”

Scott Rosenberg’s story for Backchannel is shorter than Somers’, but it’s a good overview at the myriad complications of scanning and indexing tens of millions of books, including concerns about a private tech company having so much control over information:

When Google partnered with university libraries to scan their collections, it had agreed to give them each a copy of the scanning data, and in 2008 the HathiTrust began organizing and sharing those files. (It had to fend off the Authors Guild in court, too.) HathiTrust has 125 member organizations and institutions who “believe that we can better steward research and cultural heritage by working together than alone or by leaving it to an organization like Google,” says Mike Furlough, the trust’s director. And of course there’s the Library of Congress itself, whose new leader, Carla Hayden, has committed to opening up public access to its collections through digitization.

In a sense each of these outfits is a competitor to Google Books. But in reality, Google is so far ahead that none of them is likely to catch up. The consensus among observers is that it cost Google several hundred million dollars to build Google Books, and nobody else is going to spend that kind of money to perform the feat a second time.

Tangentially, yesterday was World Intellectual Property Day, which serves as a reminder that much of the world’s information is kept in private hands for an increasingly-indefinite period of time.