Google+ The Art that Inspires Writers and Readers: A graphic tool to check if your writing is historically wrong!

Saturday, July 26, 2014

A graphic tool to check if your writing is historically wrong!

Dear Ladies (and lately also gentlemen)

I normally talk about paintings an illustrations. This is an illustration of a different kind. A graphic tool that allows you to know if that sentence you want your character to say was in use back then, say in 1850. 

You may already know about ngram from Google books. I just discovered it and had to share it. Google books has been scanning and digitalizing so many books this last few years that they predict they will have scanned all of them by the end of the decade. I mean all single, unique, books in this planet, estimated to be 130 million. They had scanned 30 million in 2013.



There are of course a lot of sticky legal issues here. It isn't easy to play some sort of librarian-Robin Hood, and just take all this books from their rightful owners to give them to the hungry (minds), many of whom can afford to buy the books. Google claims this initiative will give new life to dusty books and help promote literature. I hope that too. I personally, thankful to be able to afford it, will continue to buy books to have them in my Kindle or on my coffee table. 

For research purposes this is great though. You don't need to actually read (for free) your fellow writer's book. Google Ngram viewer allows you to search for a word, a group of words or several of these at the same time. from 1800 to this day. In al those 30 million plus  books. 



Lets test it with the infamous word hello. Infamous because many a historical writer has used it out of time. The Canadian Alexander Graham Bell made his first successful experiment with the telephone in 1876. The word hello was coined to answer the telephone. You can see that there is some base noise, but it starts to appear in books after 1880 in the ngram plot, as it should. Note that the Horizontal axes of the plot shows the years from 1800 to 2000 an the vertical axes shows increasing usage (in %). 



click to see Hello in the webpage




lets test an even newer word that I have seen on some historical fiction: starts to be used in 1960.

click here to see the c_word in the webpage



You can make pretty neat things like combining words in the same search (separated by a coma) or differentiate between the word used as, for example, a noun or and adjective.

this is an expression that was used in the past and now is barely in use: by the by 

click to see by the by in the webpage


Just for fun a  chart were I compare Jane Austin with three very popularcontemporary female writers: Diana Gabaldon, Stephanie Llaurens and Loretta Chese. I  suggest you add "Shakespeare" to this plot and see how he dwarfs the rest. I think in the case of contemporary authors this is mostly their name in their own books and  in the case of Romance novel authors, were the books contain excerpts from books by other authors, these mansions are also included.

click to see the authors in the webpage








There are several caveats to these data, for me the main problem to be used by writers of historical novels is that some words are always in use, but changing their meaning through time. This is only possible to check if the word changes from say, adjective to noun, or if the context sentence is short enough to use in ngram search (max. 5 words). The second worry I had is the representation of different genres in the sample, but I think with 30 million plus books from complete libraries, this is not a real concern anymore.


In conclusion, this is a very neat tool! I have been playing with words for several hours, to find good examples to show. In the process I have learned a lot  about our always changing language.


for a complete list of tricks

click to see how to use Google ngram 

Miranda



PD: I was duly corrected for misspelling Jane Austen's name (Austin). Interestingly,the shape of the  "Jane Austin" plot is similar to the "Jane Austen" one, but the total % of usage is totally different, showing that the increased incidence of errors (by the automatic scanning process) correlates with the  increase of the total usage of the word.

click to view the comparison in the webpage




source of the pictures:

Robin Hood: 
http://www.enidblytonsociety.co.uk/book-details.php?id=509

lady at the telephone:  http://bygoneyears.tumblr.com/post/1022091629/woman-talking-on-wall-mounted-telephone-ca-1890