Going digital


Judge Denny Chin of the Southern District of New York held a final fairness hearing Thursday on the proposed settlement of Authors Guild, et al. v. Google Inc. Sometime soon, Chin will reconvene his court and announce that the settlement is approved or that its parties must start over and try again. 

Chin’s ruling will determine the digital fate of nearly all books ever written in the English language.

Google has relied on research libraries in its quest to scan the world’s books. With the first shipments heading out this spring, up to a million of them will come from the University of Minnesota.

Some observers said an author’s right to be paid for his or her work was at stake. Others said Google, by scanning millions of copyrighted works, had risked its very existence on the project.

If the settlement is approved, Google will become the dominant resource for online books for the foreseeable future. Google, authors, researchers and the literate world await the decision, which would leave millions of currently unavailable works a mouse click away.

University librarian Wendy Lougee has worked on dozens of digitizing projects since she worked for the University of Michigan library in the 1990s and has spurred a number of the University’s own projects to digitize its unique pieces. Lougee thinks that she and almost everyone had originally underestimated the importance of digital books.

“I think it was hard to imagine how explosive it would be,” Lougee said, “and how it would really change, fundamentally, the way scholarship is done.”

Twelve million and counting

For founders Larry Page and Sergey Brin, the idea of scanning the world’s books predates Google’s current popularity and power. In fact, it predates Google itself.

As graduate students in the mid-1990s, Brin and Page worked on the publicly funded Stanford Digital Library Technologies Project.

Google was founded in 1998 and had become one of the world’s most used search engines within a few years. It was then that its founders returned to their dream of a digital world library.

Google began contacting publishers. It also developed its own scanning technology, designed for fast, high-volume scans.

When publishers were reluctant to turn over in-copyright works, Google reached out to major American libraries. In December 2004, the company announced deals with Stanford, Harvard and Oxford universities, the New York Public Library and Page’s alma mater, the University of Michigan.

Among them, Michigan’s agreement was the most accommodating: Google could scan any or all of Michigan’s 7 million books.

Paul Courant, now the University of Michigan dean of libraries, was then acting as the university’s provost. Larry Page called Courant to discuss the prospective deal and, as he describes it now, Courant was easily convinced.

“The convincing, in principle, took about 10 seconds,” Courant said.

Lougee, who spent about 20 years holding various positions with Michigan’s library before coming to the University, said Michigan had worked at digitizing books before Google came calling, and what they found was striking. As Lougee recalled, Michigan took a group of 19th-century books that had been in storage for decades, scanned them and made them available online.

“Once we digitized them, they were used a million times a month worldwide,” Lougee said.

At its own rate, Michigan estimated it would have scanned all of its 7 million books in 1,000 years. Page told President Mary Sue Coleman that Google could do it in six.

Google wanted nearly everything Michigan had to offer.

“The basic setup is that Google wants all bound volumes that are bigger than a pamphlet and smaller than an encyclopedia,” Courant said, “and that sat fine with us.”

Criticism of Google’s library deals gathered steam during 2005, as publishers and authors complained of copyright violations. In August 2005 Google announced it would halt its scanning of copyrighted works and gave publishers three months to provide a list of authors whose work could not be scanned. The following month, the Authors Guild, which represents 8,000 writers, sued Google for copyright infringement. The month after that, a group of major publishers also sued.

Authors’ and publishers’ complaints against Google hinged on the fact that, although in-copyright books were not available to be read in their entirety, they had been fully scanned and their texts were searchable. By collaborating with libraries and not publishers, Google did this without permission.

Still, Google continued to strike new deals with libraries. In 2007, Google reached an agreement with the Committee on Institutional Cooperation, an academic grouping of schools in the Big Ten Athletic Conference, plus the University of Chicago. The agreement stated that “not less than 10 million volumes” would be digitized by Google. Digital copies of books no longer in copyright would be returned to the libraries, which will pool these books in an online repository called the Hathi Trust.

The CIC’s copyrighted works will be held in a secure server by Google and will become available to the Hathi Trust as they enter the public domain.

As part of the deal, the University pledged up to 1 million books from its collection. The CIC agreement with Google runs for six years and is automatically renewed on a yearly basis until either party decides to cancel.

The proposed settlement before Chin, called the Amended Settlement Agreement, or ASA, was reached in November. Chin did not say when he would reach a final decision.

As the lawsuit and settlement process drags on, Google has already digitized more than 12 million books. It is estimated that more than 30 million books exist worldwide, and Google plans to scan them all.

During a 2008 public forum, journalist Ken Auletta asked Google CEO Eric Schmidt about how Google’s projects, outside of its search engine, would make money. Schmidt rephrased the issue.

“The goal of the company is not to monetize everything,” Schmidt said. “The goal of the company is to change the world.”

To the moon and back

Sometime in March, library staff and University students will begin pulling books from the shelves of Wilson Library. In April, Google trucks will arrive to take the first shipment to one of its digitizing centers.

Over a period of years, Google will collect and scan books selected from the University’s catalog, with books returning in roughly the same time as if they had been checked out.

Google has remained secretive about its project. It will not reveal the exact method of sorting its searches and has been guarded about the technology involved in scanning, though a number of Google Books scans have come out with visible fingerprints, indicating that the scanning is done by hand. Spokeswoman Jennie Johnson said the company cannot share much about its technical processes, though she said the company uses optical character recognition that reads the text of a page and makes it searchable for the user. This means that a researcher can instantly find and navigate each of the 64 references to the word “liberty” in Rousseau’s “Social Contract.”

If Chin approves the settlement, millions of in-copyright works that are now inaccessible will become available for a 20 percent preview by the general public.

University libraries would need to pay a subscription fee to access Google’s collection. The subscription will grant college students and faculty full access to millions of in-copyright, out of print books.

The fee, based on enrollment size, is offered at a discount for contributing libraries. These libraries will also have the chance to reject a subscription price by Google and take the matter to arbitration.

The matter of pricing still unsettles some observers, including the Association of Research Libraries, of which the University is a member. With Google holding exclusive access to so many works, libraries are concerned that Google could exploit its subscribers with a steep price.

It is unlikely that the University of Michigan will complain about its fee; as a “thank you” for its generosity, the school will get a free subscription for 25 years.

When Courant considered the possibility of Michigan physically gathering the collections at Oxford, Harvard, Stanford and other universities – including many rare and unique volumes – he easily calculated a cost in the billions of dollars.

“It’s like asking, if I stopped by the moon on my way to work every day, how much out of my way would that be,” Courant said. “It’s just inconceivable.”

James Gleick, a journalist and author of several bestselling nonfiction books, is on the board of directors of the Authors Guild. Gleick said all parties, including the plaintiffs, acknowledged the positive aspect to Google’s actions.

“During the three years of negotiations between the authors, Google and the publishers, it was completely explicit. It was right there on the table that the service that was going to emerge from the settlement was going to be a great public good,” Gleick said. “The Google people, too – who we sued – to be fair, seemed well aware that they were creating something of great benefit and not something that they were going to use solely to enrich themselves.”

Fair use

Before it signed a contract with Google, the University’s Office of General Counsel carefully reviewed the terms and possible liabilities of such a deal. To bolster its confidence, the University brought in Faegre and Benson, a private law firm that specializes in copyright law, as an outside consultant.

After all, when it negotiated its deal with Google, the University was signing an agreement with a company that was already being sued and was engaging in exactly the activity that had brought the lawsuit.

Concerns such as these are why Stanford, Harvard and other private schools offered limited pieces of their collections. As state entities, public schools are less vulnerable to losing civil lawsuits.

“We of course watched the litigation between the authors, publishers and Google very carefully,” University General Counsel Mark Rotenberg said. “But our principal interest is to have a durable long-term digitization agreement with Google for our works, and do it in a way that does not expose the U of M to unreasonable legal risks.”

Since its inclusion in Article I of the Constitution, American copyright law has become increasingly author-friendly.

All books printed before Jan. 1, 1923 are in the public domain. Books printed from 1923 to 1977 can remain in copyright for 95 years after publication. All works published after 1977 can hold copyright for 70 years after the authors’ death.

The best estimates are that 20 percent of existing books are in the public domain, while only 10 percent are in-copyright and in print. The remaining 70 percent of all books ever printed are in-copyright but out of print. Rather than track down the authors or rights holders of these books, Google obtained them from libraries.

University Law School professor Tom Cotter said the authors’ and publishers’ complaints raised the question of “fair use.” Fair use is what allows an author to quote a phrase or a paragraph from a copyrighted work, but Cotter said its full meaning is up for interpretation.

Had the parties not agreed to settle, Chin could have nailed Google with an enormous punishment, given the number of books it has scanned. Copyright infringement can carry a penalty in the tens of thousands of dollars, and if “willful infringement” is found, the fine can reach $150,000 per infringement. Given that Google has scanned millions of in-copyright works, Cotter could see why the company settled.

“Even if it’s likely they would’ve won on fair use – and I would say you probably had a good 70-80 percent chance of winning on fair use – but you know, a 20 percent chance that you’re going to be hit with a trillion dollars in damages still gives you a pretty significant motive to settle,” Cotter said.

In its statement of interest, the Department of Justice also noted Google’s dominant position in the market for digital books. Though there are other large-scale scanning projects, Google’s closest competitor, Amazon.com, has scanned only 3 million books, one-fourth of Google’s total.

The Department of Justice found that the settlement agreement would give “significant and possibly anti-competitive advantages to a single entity – Google,” and that the proposed “pricing mechanisms … also continue to raise antitrust concerns.”

The statement advised the judge to reject the settlement and force the parties to renegotiate again.

Another major sticking point in the settlement was that of so-called “orphan works,” books that are in-copyright but out of print and whose rights holders could not be located.

Because Google scanned so many copyrighted works, approving the settlement would amount to “basically giving Google a monopoly” to these orphan works, Cotter said.

But Google has struck a deal with the Authors Guild that would compensate authors of copyrighted works through ad sales from the Google Books site and through the subscription fees paid by universities. For orphan works, the profits will be held by Google, and over a five-year period, 25 percent will be spent to attempt to locate authors who could not be found. After five years, the remaining 75 percent would to go to a literacy charity.

For Cotter, this is an imperfect but acceptable answer to a difficult question.

“I think the world is a better place with this settlement in place,” Cotter said. “The works will be more accessible than they currently are.”

Courant remembers that as Michigan made its initial foray into the deal, he and the university’s general counsel thought it likely that Google would be sued, but not the school itself.

“We didn’t think that our risk of being sued would be very high,” Courant said. “And indeed we haven’t been sued.”

Scholarship has taken precedence over copyright since its very conception. This April marks the 300th anniversary of the first copyright law, penned in the British Parliament.

Article V of that law, the Statute of Anne, required that all publishers and booksellers reserve nine copies of each copyrighted work “for the use of the royal library, the libraries of the universities of Oxford and Cambridge, the libraries of the four universities in Scotland, the library of Sion College in London and the library commonly called the library belonging to the faculty of advocates ofEdinburgh, respectively.”

A brave new world

Professor Robin Brown seems a perfect customer for digital books – a man who loves reading, but not books.

“There’s nothing heavier than books, except for wet books,” Brown said. “You have to dust them. They take up space.”

Though he often reads online, Brown finds that he cannot read long, difficult arguments from a computer screen, at least not critically. He asks his doctoral candidates to bring in their work printed out.

Brown recently read the full text of Aldous Huxley’s “A Brave New World” online and found that it was not only a different experience aesthetically, but practically as well. He could quickly skip around the book or focus on particular words and phrases.

Not only are texts easier to work with, Brown said, but there are more of them.

“I have more information available to me sitting in front of this screen than any human has ever had before,” Brown said. “I think it’s totally cool.”

James Gleick said he and his wife, who is also a nonfiction author, depended heavily upon digital sources in their work, and that research that used to take days in a library now takes moments online.

Brown thinks that having spent the first few decades of his life working with the printed page may make it harder for him to read from a screen. He’s seen a dramatic change in his students, who have grown up with the Internet.

“The best students I’m dealing with now are smarter than any I’ve ever seen,” Brown said. Of his graduate students, he said, “They’re terrifyingly smart, and they’re smart in ways that are enabled by electronic technologies. They know more about more things.”

From forgotten to forever

The Greek library at Alexandria burned, as did the Library of Congress in 1851. Allied and Nazi bombers, Chinese emperors, earthquakes, fires and floods have erased much of the world’s great works.

In an op-ed piece titled “The Forever Library” in the New York Times last October, Sergey Brin made reference to a 1998 flood in the Stanford library that ruined thousands of volumes. Stanford’s losses joined a long list of knowledge disasters.

Tim Johnson, the University’s associate librarian for special collections and rare books, knows these stories. Johnson keeps a copy of the “disaster plan” at work and another at home.

The University’s collection includes two front-and-back pages from Gutenberg’s 42-line bible, printed in 1455, and a complete volume of the King James Bible from 1611, the first year of its translation into English. On request, he has digitized and e-mailed these and other rare texts to scholars at the University and around the world.

Johnson also has a number of 17th- and 18th-century pamphlets from England.

“How many other copies of those exist somewhere else, I don’t know,” Johnson said.

Johnson said that researchers who view old books as artifacts – those who want to feel the pages, examine the binding or breathe in hundreds of years of dust – will continue to trek from library to library, country to country. But he guessed that a high-quality digital scan would be good enough for 90 percent of scholars.

“From a research perspective – and I think this is one of the things that Google has been saying – it literally does open up all this forgotten literature,” Johnson said.

On its own, the University is gradually scanning its rare books and archives. This summer, a state-funded project will begin digitizing the “Green Revolution” archives, which include Norman Borlaug’s field notebooks from Mexico and South America and his correspondence with Indian Prime Minister Indira Gandhi.

Jason Roy, who oversees the digitization projects, said that he thinks the University’s slower scans are of a higher quality than Google’s, and that his office’s job is to catch things that “slip through the cracks of what Google’s doing with the general collection.” Still, Roy acknowledged that he, Lougee and the University are at the whim of public funding to complete these projects.

At roughly $60 per book, Roy said that the University would have spent $60 million – and waited much longer – to digitize the books it is sending to Google.

In a series of essays in the New York Review of Books, Harvard librarian Robert Darnton has expressed regret and some fear that the digitizing of the world’s books is being carried out by a private company rather than a public project. In December, after the ASA was reached, Darnton wrote that only an act of Congress, with the federal government taking over the project, “would transform Google’s digital database into a truly public library.”

Courant said Darnton is a good friend of his, but that he did not see the government taking action as Google did. Gleick went further.

“I strongly disagree with Darnton about this, because he is imagining the best of possible worlds,” Gleick said. “The fact is we live in the real world. And in the real world, the only entity, public or private, that has been willing to invest the hundreds of millions of dollars necessary to digitize these books happened to be Google, so far.”

Lougee admitted that Google may not be the “perfect venue” but said that it was the only one.

“Google,” Lougee said, “kind of took the world by storm.”