Tuesday, January 30, 2007

Google and the Libraries

A couple of weeks ago I wrote an article in The Sunday Times about Google's digitisation of the world's libraries. Some - Google included - seemed to think I said this was about to destroy civilisation. This does not fill me with hope about the ability of these people - Google included - to read. Anyway, I have been too busy to reply to all the emailed responses so I have decided to include some of them as comments on this post with, in fairness, a degree of anonymity. They demonstrate, if nothing else, the strength of feelings involved.

16 comments:

  1. Hi!

    I have read "Could this be the final chapter in the life of the book" and I must say it is obvious you are biased on the issue. Because of that you fail to research enough and are satisfied with the facts which tend to prove your theory.

    The most obvious is the hypothesis on Google's cultural influence on the search results. From my experience I can say that Google is one of only few web companies which try to understand the global picture of their users. This is why, for example, they are trying to provide the site in
    as much languages as they can:

    http://www.google.com/preferences

    and of course use this information to display results which would be
    most useful to the given user. For example, if you search for the
    Gustave Flaubert on the French Google Book search:

    http://www.google.fr/books?q=Gustave+Flaubert

    you get French results first. Of course this is also written in the
    Google Help center:

    http://books.google.com/support/bin/answer.py?answer=43745

    I also deeply disagree with your opinion about "universally accessible"
    as a threat and I just hope that there will be the day when information
    will be freely accessible and not limited only to the people who can
    afford it, which, I believe, will help more developing nations than care
    for their cultural well-being.

    Overall it seems to me that very similar ideas were around when
    Gutenberg invented movable type. Ideas that people cannot learn just
    from books but knowledge need persons to distribute them, that Bible
    should not be read by yourself, that it is not good to share information
    and knowledge, that knowledge should stay in the domain of the
    privileged ones ... I could easily transform your last lines to suit
    that time:

    "For the book doesn't educate and the mind must be primed to deal with
    its informational deluge. On that priming depends the future of
    civilisation. How we handle the printing of the books will determine who
    we are to become."

    As you wrote, 500 years ago we got books which enabled us to share
    information and knowledge easier, more rapidly and to regions where this
    was not possible before. Now we are at the similar point in our history,
    we are getting rid of the limits the material nature of the books have
    and we are going forward. This are steps forward. I think you should
    enjoy the ride.

    As I enjoy reading books. I also enjoy riding horses but for travels I
    use train instead.

    Best regards




    P.S.: I have found this article at the front page of the Google News
    engine. I see this as another example of a good effect of "universally
    accessible" information which they try to provide.

    ReplyDelete
  2. You name was listed at the end of the Times article "Could this be the final chapter in the life of the book"

    Regarding the note, "Try this: go to Google Book Search and enter Gustave Flaubert. The first results are full of English translations of Madame Bovary."

    I just wanted to point out that google has a French Book Search (link below), that prefers French Language books over English ones They have it for a few languages as well.

    For books.google.com , the assumption of course, is that you read English, and would thus need a translation of the French version. I don't understand French, but I can assume that the top few results in the search below are French language.

    You make many good points, but as I'm sure you've come to realize in writing many articles, a little extra QA can aid a writer's credibility.



    Curtis K

    ReplyDelete
  3. hello Mr Appleyard

    I found your piece on libraries stimulating.

    I tend to oscillate between sadness (and fretfulness)
    at the possible erosion of the book, and the thrill
    at the incredible reach that both books AND the web
    have had all over the world, and what seems to me
    an extraordinary growth in diversity of publications
    (much rubbish too, but so much great writing as well)
    that I suspect we will see for many decades to come.

    I just wanted to flag 2 points. I believe the European
    search project recently became a France-only one
    after the withdrawal of the only other partner (Germany).
    I am sure your cuttings (do they still have those in
    newsrooms ??) will bear this out.

    And the Flaubert point is not as gloomy as you may think.
    if you search for Flaubert via Google but switching to
    their french 'version' either for normal search or for
    book search, you will see all the French websites and
    French books (some from La Bibliotheque Nationale etc...)
    that they've been able to purloin... errr scan.

    It would be slick for one of the flow of enhancements to
    Google UK version to offer an option - whenever you
    type a phrase of likely non-Uk provenance, to offer you
    the option to "take a look at the set of results a French
    reader would see" or some such.

    I am forwarding this email to Obi Felten at Google who
    may be able to put this on someone's list to work on
    (or push it up a priority list if its already there).

    best wishes to you and Dominic R

    ReplyDelete
  4. Dear Mr Appleyard,

    I work for a small, nonprofit publisher. We were investigating
    letting Google scan our books and I was horrified to learn that there
    are no provisions for proper proofreading. In all the articles I've
    seen on their project, no one has expressed any concern about the
    quality of the scans. When I think of the mistakes I've seen in works
    produced using OCR, I am very concerned about the future of human
    civilization. I hope you'll consider looking into this and doing an
    article (or book?) on it.

    With best wishes,

    ReplyDelete
  5. "it is the teachers who will have the final say. They will determine whether people will read for information, knowledge or, ultimately, wisdom. If they fail and their pupils read only for information, then we are in deep trouble. For the net doesn’t educate and the mind must be primed to deal with its informational deluge. On that priming depends the future of civilisation. How we handle the digitising of the libraries will determine who we are to become."

    I'm writing because I want to share with you that, as a ravenous lover of everything "book", I am yet resigned to the realization that love like this may indeed be blind. Holding on to a 500 year-old technology as if it were natural element is ironic , especially for animals that have reached song from grunt, tasting from gulping, or measured intellectual discernment from fearful superstition.
    Look, I want to keep my books too, but, may I ask, what are you complaining about in your article? The method of learning? No. On further reflection I think this about them gold 'ol days: "In my time there were books, dammit! Those times meant something. Search engines, bah!" "Granpa, are you mad because I learned to avoid all your mistakes, or that I learned in 10 minutes?"

    And I can't let you walk away leaving statements behind like these:
    "An index is the work of a mind with knowledge, serach engine results are
    product of an algorithm with information"

    What? Search engines and algorithms arn't works of mind? What are search engines and algorythims if not the product of mind? Choose wisely now.

    This morning I had a brief discussion with a fellow parent, mid-forties (I'm mid-fifties) about the the merits or the lack thereof of videogames. He and his family, he said, are decidely against them. "They repeat the same information over and over. There's no learning." "Versus what other kind of learning?" I wanted to know. "Real learning", he said. Man, is that pathetic or what? How the hell does he, you or I know how humans will extend and apprehend their knowledge in, say, another 500 years? But so you know it's not just older adults, how about the 20-something sales clerk at the local Games Stop (video game retailer). I make an oberservation: "Videogames could be unbelieveable in 5 years. And in 10, wow!" Now as adolescently silly as that statement was, it was rocket science compared to how the young saleman responded:

    "Well, I don't think so. They (the computers/software), can't be developed
    too much more. There pretty much done. I follow this stuff close, and [put
    my own systems together], and I can tell you, you won't see anything
    fundementally new coming around again. It's done."

    It's done. This savvy, computer literate gamer thinks gaming technology has come to a standstill for sometime to come and all because he can't fathom what else could be coming next. Here's a rule: You don't get to know before the guy or gal who will actually create it knows it. Whatever "it" will be. Everything, everything you're mind processes, which counts, it does without caring what you want to call it. The mind only wants awareness to continue the forward motion of sensory accumulation and organization. Magazines, comic books, ccomputer screens, MP3s, and treasure maps....All that the thing between your ears cares about is moving, unless you teach it not to.

    But I digress. There will be books, but they'll only be called that to humor us old farts who have such a hard time getting off stage. Bryan, our time, which is the accumulated time of all each of us cared to know to this point, will, must, pass to history, as you well know. Books were neat. They had a great run. But you should jettison your creeping ageism, and instead ask "what's next?" Because holding-on like this is just too sad.

    ReplyDelete
  6. They are closing the libraries in are county of Jackson in the state of Oregon

    ReplyDelete
  7. Your assertion about Google Book Search was wrong:
    ..."In December 2004, Google announced its assault on these peaks. It had made a deal with five libraries — with the NYPL and at the universities of Stanford, Harvard, Michigan and Oxford — to scan their stocks, making their contents available online via Google Book Search (books.google.com). Ultimately, it is thought, some 30m volumes will be involved. Microsoft, meanwhile, has made a deal with the British Library to scan 100,000 books — 25m pages — this year alone. Google has now scanned 1m books."…

    I went to Google Book Search. I then input "Call me Ishmael." with the quotation marks -- Google requires quotation marks for multiword searches where the order of the words is important -- which is one of the more famous opening sentences in U.S. literature. Guess what? The first twenty or so results did not come out and state "Moby Dick". Most of the results alluded or referred to the book but did not come right out and state clearly where the phrase came from.

    ReplyDelete
  8. Bryan,

    Thanks for an insightful column in the Sunday Times (Jan 21) regarding Google's digitising project. I agree with your assumption that it is indeed not up to the reader, but the teacher, but I offer this corrolary.

    I can say, from the point of view of this librarian at a baccalaureate institution, that it's not the death of the book we should be worried about, it's the death of reading. Like the USA Today newspaper in America that shortened all journalism into two paragraphs, digitising a book gives it to a medium that does not encourage reading, is not designed for reading, and considers reading to be tedious. It used to be called "sustained reading," a concept teachers promoted to encourage a lifelong appreciation for the written word, and the book was the ultimate device for delving into the understanding of the world. But now Google's push for digital copy will accelerate that death, by moving it to a medium that looks remarkably like a television, and by all accounts acts like one too. Is it any wonder we know so little about our world?

    Thanks again.

    ReplyDelete
  9. Hi -

    I was reading your article and was
    struck by the casual usage of the words "information" and "knowledge",
    as well as the cavalier assumptions made about the nature of these
    intangibles.

    (You should probably know that I found your article through google
    news, and would not have come across it without their indexing
    algorithm).

    A good example is your assertion that "An index is the work of a mind
    with knowledge, search engine results are the product of an algorithm
    with information." You seem to have an overarching assumption that
    when data is held digitally, it is of a lower form called information.
    Conversely, when touched by a human mind, this data is risen to a
    higher form called knowledge. Now, I am likely oversimplifying your
    views, and I don't wish to be condescending. However, the distinct
    impression I come away with is that you view the human mind as
    infallible, and algorithms as fundamentally flawed.

    I won't expound here on my own ideas of what information and knowledge
    are - suffice it to say I consider the latter a subset of the former.
    Even if we are to simply regard knowledge as either true or useful
    information, there is absolutely no reason to assume that a human is
    more likely than an algorithm to reference it until we know more about
    both the human and the algorithm.

    ReplyDelete
  10. I just finished reading your article in the Times Online on Google's scanning of library works and I thought the article was very interesting and well thought out. I'm an American student at an American university and I've found that our generation is becoming more and more drawn to online sources of information, with mixed results. Almost all of my classes have large amounts of required readings but they are all available from academic journal articles that have been scanned and made available online on the school intranet. This seems to be becoming increasingly common and is making information available to many more people than would be able to physically hold or read the journal article. In my opinion (however uneducated), unleashing information on the internet can only do good because I'm hard pressed to think of a time in history where civilization did not benefit from increased distribution of information. I also think your point on google being a for profit company was a shot in the arm that many people seem to forget, as google and their "don't be evil" motto seems to make it seem like a totally altruistic organization devoid of capitalistic intentions. Again, excellent article that I expect to read again in the future. thanks.

    ReplyDelete
  11. Good day to you, Mr. Appleyard. I thought I might provide you with
    interesting tidbits of what Google and Microsoft and the providers of
    titles to scan have in store for them:

    Firstly, French copyright law treats literary property like real
    property (which it is, actually: we in the English-speaking world simply
    allow ourselves to be cheated after 45 years): it is in perpetuity the
    property of its owner/creators and descendants, so long as it remains in
    print. You've no idea what finagling the translations of French standard
    works into English entailed. Even now, as I write this, French intellectual
    property avocats are sharpening their sabres. As any good lawyer knows,
    allow the offence to continue until there is no recourse or pleas of innocence
    possible on the part of the copyright hijacker, then nail them for all they
    are worth.

    Secondly, the mental mechanism of reading and retaining information doesn't
    quite work as planned with computers, nor with any other electronic
    presentation of text. Those who found this out first were aviation mechanics
    working with laptops while repairing tricky subassemblies. This will out
    with time. For some obscure biological reason, computers defeat contemplation
    and the ability to solve problems through concentration. Likewise, one's
    tendency to transliterate number sequences increases exponentially if
    sorting out lists of numbers on a computer screen versus printing them out
    then straightening out the list.

    Thirdly, the indexing aspect is what will drive the nail into the digital
    coffin, as more and more digital librarians come to know what the words
    "semantic" and "relational logic" mean.

    In 1990, IBM published a work on technical writing with a picture of a
    burning building marked "Library" accompanying a text on the future of
    writing. I said "Bosh" then and do now. Likewise do I say that there is
    nothing wrong with the InterNet the Society of Indexers couldn't set right
    in a gnat's heartbeat.

    Thanks again for your lovely exposition. I do hope they sell tickets for
    the final death-match between Conan the Librarian versus Bill the Bookburner.
    (Or perhaps make a computer game out of it...)

    ReplyDelete
  12. Dear Mr. Appleyard,

    I fear your article puts you in the running for at least an honorable
    mention for a "Luddite of the Year" award. While there certainly will
    continue to be copyright- and compensation-related issues to be
    sorted out, information transmission technology will continue to
    evolve. Language, writing, paper, printing, moveable type, telegraph,
    telephone, radio, TV, the Internet, and whatever comes after that are
    merely steps along the way. Knowledge (so far at least) is something
    only one's brain can achieve, upon successful processing of
    information from whatever the source. A book is a mere information
    transfer device, better than what came before it, but with real
    technological limitations.
    Here's a current example: In December my wife learned (via the
    Internet) of an out-of-print book published in Brazil (in Portuguese)
    that she would like to read. No library in the State of New York has
    a copy. Our local reference librarian has located three copies in the
    United States which are available for interlibrary loan, one at the
    Library of Congress, one at a university in Texas, and one at a
    university in California. If we borrow the one from the Library of
    Congress, there will be no charge, but she can only read it in the
    local library. The other two she could take home to read, for a
    modest charge ($16 to $20). A month has elapsed so far, while the
    librarians work out the details.
    We do not need the paper part of the book, only the information
    encoded thereon. We look forward to the day when we could pay a
    reasonable fee to obtain the information immediately, as is now
    technologically possible. It is a separate question whether the final
    display of the information should be via a screen device or on paper,
    perhaps as a printed-on-demand book; that choice depends on the
    relative efficacy of the various alternatives available at the time.

    ReplyDelete
  13. speaking as one of the Old Network Boys of the Internet, one of whose subspecialities was computer security, and as a published author even if his royalties are so small they're almost not worth protecting -- except for the principle of the thing -- and assuming that the somewhat ambiguous verbiage at the top of the sunday times 'review' combined with the url to your site at the bottom implies that you wrote it, one small technical detail that never seems to get any ink or electrons:

    the likelihood that google can technically live up to its promise only to make 'snippets' available is vanishingly small.

    when the author's guild bulletin ran an article last march about a 'symposium' they'd held on the digitizing everything nonsense, before i wrote them a letter to the editor it only took me around 10 minutes [via google of course] to find someone -- a 'grad student', actually -- who had already cracked [or as some would call it hacked] their protection system and could cause entire books to be downloaded. while i believe they upgraded that system [he was naive enough to have told them about his feat], it remains the case in the computer field that protection mechanisms are 'always' broken. the recent flap over the breaking of high-definition dvd 'digital rights management' is of course a case in point, but there are any number of others.

    so i'd be so bold as to urge you to bend every effort get the small technical detail lots of ink and electrons, since clearly you have some access to 'the media' and i don't [for various reasons which i won't bore you with, especially since i don't pretend to know all of them anyway].


    cheers, map

    ReplyDelete
  14. Bryan,

    For a moment I thought I had wasted my £2.00 on the ST today but not so; what a great article you have written (as above).

    Presumably you took into account the debate with Rees-Mogg in "The Times" over much the same issues? What I personally regard as being poor substitutes for real books are "Wikipedia" - and 'pseudo-books' in HarperCollins "Eminent Lives" series!

    Thanks again; do reply if you want all of the relevant dates from the debate in "The Times" as I kept all the cuttings? Regards,

    ReplyDelete
  15. Hi Bryan,
    I'm not quite following you.
    "They will determine whether people will read for information, knowledge or, ultimately, wisdom. If they fail and their pupils read only for information, then we are in deep trouble. For the net doesn’t educate and the mind must be primed to deal with its informational deluge. On that priming depends the future of civilisation. How we handle the digitising of the libraries will determine who we are to become."
    People can use books for doorstops, kindling, or entertainment. A formal education can produce an adept mind, or more likely, a cultured parrot. I believe a society, like a person, is as sick as it's secrets, and as ineffective as it's ignorance.
    Everybody thought that VCR's were going to eradicate movie theaters. They didn't. Drum machines were going to eradicate drummers. They didn't. I think that people who have a genuine thirst for knowledge will get it however they can. Books are still, and will remain, extraordinary technological accomplishments. They are portable, require no power source, don't freeze up or crash, are very inexpensive, last for decades, require no idiotic upgrades or additional expenses or technicians and infrastructure to maintain , and are sensually different than laptops. You can sit under a tree with a laptop, but a book feels better, and you can read it on your back, on your side, and upside down. If you spill a soda on it, you can dry it out and keep reading it, or just read it while it's sticky. Pour a soda on your laptop and try that.
    Plenty of people grind their way thru the education system doing only what is required of them to get their degree, so they can get a job that will provide them with the social status they think they require. When memory chips can be implanted directly in one's skull, these people will go out and buy them. Fuck 'em. They are dumb-asses who can afford memory chip implants.
    I would worry more about how intelligence and the capability of abstract thought are viewed as unmanly, annoying, and ultimately threatening. At least in this country. See the movie Idiocracy by Mike Judge for a glimpse into America's future. And his movie Office Space for a glimpse of our present.
    I dropped out of high school. Mark Twain said " don't let schoolin' interfere with your education." I took that seriously.

    ReplyDelete