Tuesday, August 30, 2005
Google Print vs LoC classification  
Phil Bradley points to an interesting analysis by Thomas Mann* of the problems inherent in using full-text searching, a la Google Print, to access scholarly material:

"Keyword searching fails to map the taxonomies that alert researchers to unanticipated aspects of their subjects. It fails to retrieve literature that uses keywords other than those the researcher can specify; it misses not only synonyms and variant phrases but also all relevant works in foreign languages."

Instead, Google is swamped by millions of pages which contain the required keywords, but not necessarily the content that the searcher seeks. A fully operational Google Print, Mann argues, will only magnify the problem (and imagine trying keyword searches when many of the books in the database will be dictionaries).

Makes sense to me. And a much more reasoned critique than Gorman's from a few months ago. It seems to me that Google Print would have one obvious point of advantage, which would be identifying texts which contained reference to a particular named person, organisation, object or place. For example, someone researching a minor historical figure might have to manually search through a huge number of books (or at least indexes) to locate those which referenced the subject - using Google Print would presumably save the researcher a considerable amount of time.

(*Not this Thomas Mann, unfortunately).