Sunday, May 29, 2011

Reprise on Google, eBooks, Copyright and All That Jazz

  An enigmatic personage by the name of Dr Beachcomber has sent me an email with a link to his posting Google Burns the Library at Alexandria. He has included my reply as a comment on his blog, so I am returning the compliment by referencing him here. Is this what you call some kind of hippy blog-in?
  While I have mildly chastised him for over dramatics in headline writing, the books not actually being burned as a result of having been digitised, there is a issue of concern regarding the quality of scanned digital editions, and another issue brought up by another commenter on the recopyrighting of material already in the public domain as a result of it being reprinted or republished. There is also the very tricky issue of the destruction of original printed or written material after it is digitised.
  Taking the last first (Hey, I'm in Australia, we are upside down here!), I was many years ago doing a research project which involved examining museum records and objects. Now museum curators have a habit of updating their records when they think that a person looking at them is some kind of expert and they ask them questions about things. For historical reasons, I wanted to know what the original records said about the objects. With the old handwritten cards and registers, it was possible to separate the original records from later annotations, and even to work out who had made the annotations and when. Only one museum had an electronic catalogue at that time (1991), and they were quite disappointed that I actually wanted to look at their tatty old paper records. I guess the question is, how many old paper backups do we need to keep for safety? The same applies to books.
  On the second issue, I was told many years ago by a copyright legal bod in my university that it was legal for me to scan out of copyright visual material and republish it digitally, but it was illegal to reproduce digital scans from modern facsimile editions of out of copyright material. My only question about that is, how could anybody tell? At the moment the business interests are noisily defending ever increasing copyright restrictions, but the ready availability of copying and reproduction technology is going to make soup of that, and real soon. I suggest that if you have some favourite old, genuinely out of copyright, books in your particular area of interest or expertise, digitally reproduce them yourself, circulate them among your friends and colleagues, and loudly announce them as public domain.
  The quality of some of the old material scanned and placed in the public domain is an issue. Dr Beachcomber is determined that Internet Archive editions are better quality than those from Google, but I bet he has never spent three days printing a long book page by page from two separate Internet Archive scans, hoping that the pages missing from the two editions do not actually coincide at any point. The end result was a largely black and white edition with occasional colour pages, none of which had bookmarkable or cut and pastable text as they were simply image scans of pages. And the Kindle editions are similarly unnavigable and messily formatted. And the text only versions are unformatted to illegibility and full of OCR errors. But apart from that they're alright. I suspect that there is just some degree of luck with the digitisation of particular works, and how carefully they have been done.
  I have touched on these issues in earlier posts, Eeee! Books, and Scribes, Copyright, Crime and Google, with a short note at the end of Horrible Old Handwriting. I guess the whole issue is just not going to go away real soon.
   The whole issue of preservations of books and text is, of course, not new, but there are so many texts to preserve these days. We have almost no original Roman era texts of the Latin Classics, because they were written on papyrus rolls which fell to bits. These works are mainly preserved from much later copies in vellum codices, much more durable, produced by Christian monks. The thought of these celibate ascetics solemnly copying down the erotic poetry of Ovid and the like is always good for a giggle, but they did. There have even been conspiracy theories that the monks actually forged all the Latin Classics. I doubt it, but how much did they edit, correct, annotate and standardise these texts? Perhaps Cicero or Livy might be surprised to discover what we think they had written.
Postscript: With apologies to Dr Beachcomber, after rechecking, it seems that the download I had such trouble with was a Google scan, although I accessed it through the Internet Archive. It was one of a large set uploaded by one tpb, who seems to be a very messy worker. Perhaps I was dead unlucky, because there appears to be another edition of the same book available through the Internet Archive which is not from Google, so at least they are not claiming a monopoly for their grotty scans.


Jonathan Jarrett said...

I've met tpb's work too; there's a an awful lot of it. Though it is true that he or she seems to have gone quite fast and reckless nonetheless there are many texts I would not have been able to check with any ease had they not scanned it or uploaded someone else's scans. So on the whole I think I'm more grateful than annoyed.

Dianne said...

Yes, I have been able to get hold of books that are as rare as fertile dinosaur eggs, albeit with some difficulty. I just hope that the presence of these substandard scans doesn't prevent someone from doing a better job on the same work. I guess its just an infancy problem for the whole process, and eventually the little grommet will grow up and give us a good night's sleep.