Thursday, November 17, 2005

The Battle Over BookSearch

I attended an event hosted by the New York Public Library about the lawsuit by the Author's Guild and the Association of American Publishers against Google and their nascent Book Search program. Chris Anderson, the editor of Wired Magazine, moderated as publishing and authros guild leaders clashed ideas with Larry Lessig, founder of Creative Commons, and the general counsel for google, David Drummond. There is a webcast available, but it is in quicktime 7, so I can't vouch for it.

A brief overview: Google is working with 5 prominent libraries to digitize and index their collections. Google will make these works searchable, and will provide full texts of works in the Public Domain. Works that are copyrighted but are offered by the copyright holder will also be provided full texts. Works that are copyrighted, but no contact has been made with the copyright holders, will be presented only in snippet form. Copyright holders who do not want their works in google's index or search results can contact google and let them know. Google will then remove them from the search.

Google's program is an Opt-Out program rather than an Opt-In program. In the Open Content Alliance, works not in the public domain are only digitized if the copyright holders explicitly grant permission. In Google's program, works are only excluded if the copyright holders explicitly forbid it. In addition, in order to index the books, Google has to make a digital copy of the book without paying for it.

The copyright holders present two main objections.
1. Copyright holders don't like the opt-out precedent. They don't want to have to seek everyone out and let them know that they object to a fringe use. They want copyright law to be clear that, when in doubt, you should ask copyright holders before doing anything with their works. Lessig points out how stifling this is for any works that are in the long tail where clearing copyright becomes difficult.

2. Google make a copy of each book when it puts the books into its database. This is an infringement. Google uses this unbought copy of each book to increase their ad revenue by increasing the services that people come to Google for. A book search's value increases not only by the range of results one gets back, but also by one's sureness that the search has covered all relevant areas. If all BookSearch covered was public domain books, fewer people would use it and Google would earn less ad revenue. Google is profiting off of content creators without compensating them for these copies.

This is the real crux, because if this is fair use, then Google doesn't need to provide the book publishers and authors with an opt-out. The law doesn't give them those rights.

On the other hand, US law typically holds that ephemeral copies not used are of little value and are not infringing. Google would do well to emphasize how different there storage of texts is from the actual readable text of the book.

During the Q&A an audience member with an unfortunately horrible speaking style proposed a thought experiment that seemed to me to get to the heart of the matter.
It was not very well addressed there, but you might see it better in writing.

I am an enterprising and fictional businessman who writes indexes to novels. Novelists often address many ideas in their books, but rarely provide indexes. My business is to publish indexes to novels so that students and other researchers can easily find concepts in texts without having to re-read the entire book. In my books, I provide an alphabetical listing to the concepts and then page numbers where the concepts are discussed. I may also provide small, fair-use quotes from the text. So entry might list passages discussing bigotry in Harry Potter books along with snippets to give the flavor of each passage. So far, no one argues that this isn't fair. And the result is much like Google's Book Search.

What is at the center of the argument is really about how I make my fictional indexes to fiction. I go to the library and check out Harry Potter books. I bring them home and photocopy each page multiple times and as I read them, I cut up the copies into bits to file away in drawers. I spread the pages all over my house, scribbling notes on them. I take these copies and tape important and useful bits to my wall. Eventually I collate them, organize them, and begin the task of recording all this information into my book.

I return the the whole, undamaged library book. I keep my copied, folded, spindled, mutilated and cut up notes at home so that as I improve my organizing and find more connections I can publish new editions of my index.

What the author's guild and the Publisher's Association are saying is that the real copyright infringement to them occurs while I am writing my book, because I did not compensate them for my research! Instead of seeing the information manipulation as a fair use while I create a new work that is not derivative, they see it as a new copy of the book that they should be compensated for. Lessig quite rightly pointed out how anytime information is used in the digital age a copy is generated. US law has to recognize new boundaries more appropriate for the digital age.

Google might also change their approach to non-permissioned works. An entry in a database, properly split up might not be reconstructable into a full text of a book. That would provide them the ability to do search, but wouldn't leave any full "infrining" copies of the book around. Seems like a kluge, and they should just fight the good fight to allow this kind of innovation to work.