Posted on

Not Accounting for "Ifs," Scientists Link "Ands" & "Buts" to Bestseller Status

NY Times Bestsellers -January 19, 2014_peoplewhowriteA group of computer scientists at the State University of New York at Stony Brook have developed an algorithm the say can determine a book’s likelihood of becoming a bestseller. The Telegraph reports, “A technique called statistical stylometry, which mathematically examines the use of words and grammar, was found to be ‘surprisingly effective’ in determining how popular a book would be.”

Assistant Professor Yejin Choi was among the group which analyzed bestsellers from a sample they downloaded from the Project Gutenberg archive. They also sourced slow selling books by checking Amazon rank. The 800 books they studied covered a range of genres including sci-fi, classic lit and poetry.

The methodology of the study raises some red flags.  Project Gutenberg is mostly stocked with old school classics like Les Miserables and  Adventures of Huckleberry Finn, which, no disrespect to these must-reads, were released in a market that had far less media to compete with and have the benefit of having been added to multiple school reading lists for decades. Likewise, Amazon rank is a nebulous factor. Author and member of the “Fulfilled by Amazon” program Cynthia Stine explains Amazon rank this way on the website  

I can define Amazon sales rank in one sentence:

“The period of time since an item last sold.” 

That’s it.

What does that mean? It means that starting from one hour after an item sells, its rank will start to rise until it sells again. The longer the gap between sales, the higher its sales rank grows. When the product sells again, it will drop significantly and then begin to rise again an hour later.

That in mind, here’s what the SUNY Stony Brook scientists derived based on their analysis:

They found several trends that were often found in successful books, including heavy use of conjunctions such as “and” and “but” and large numbers of nouns and adjectives.

Less successful work tended to include more verbs and adverbs and relied on words that explicitly describe actions and emotions such as “wanted”, “took” or “promised”, while more successful books favoured verbs that describe thought processes such as “recognised” or “remembered”.

I think the true “science” behind a bestseller involves several factors including the amount of support the publisher has committed to giving a book/author. Publishers decide which authors to invite/which books to push at industry conferences and events where power players like the New York Times Book Review Editor or the head of the American Library Association will be in attendance, for example. Then there’s the Amazon factor.

Publishers are scrambling to compete with the e-tailer’s low prices, as well as their innovations and role as chief book recommendation engine.

Add distribution to the mix. When Barnes and Noble spanked Simon and Schuster by reducing the number of titles they would carry in store (because they felt the publisher was not supporting them enough in their efforts to hold off Amazon’s encroachment), S&S authors who fell into the debut or emerging categories lost a major distribution channel. The beef between B&N and S&S has been resolved, but it probably didn’t do too many favors for authors whose books were released in that period.

Oh, and the author’s talent. And how many followers they have on Facebook / Twitter / Tumblr / Instagram… And the big names they’re able to get to blurb their book (see Miranda Beverly-Whittemore’s tips on ‘How to Ask for a Book Blurb‘). And…

So, yeah. Forget algorithms and just write.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s