Curmudgeons.net

Every normal man must be tempted, at times, to spit on his hands, hoist the black flag, and begin slitting throats. –H.L. Mencken

When Statistics Suck


Statistics are wonderful. They keep me interested (somewhat) in baseball and are what I find least dull in economic analysis.  The trouble with statistics is that they are not always put to good use. Take, for instance, the RBI (Run Batted In) in baseball. A high RBI total is seen by some baseball commentators as an indicator of the value of a star player.  In their view, a player who makes good contact (hits, home runs, or deep fly balls) when his teammates are in scoring position earns many RBI. Therefore, players with many RBI are good hitters.  However, this not a proof of a good player, rather it is only proof that this hitter’s teammates are good at getting on base. A player may consistently hit well, yet if his teammates are terrible at getting on base, he will not compile many RBI.

To steal a line from Bill Cosby, I told you that story to tell you this one:

The Largest Vocabulary in Hip Hop

Matt Daniels is a ‘data scientist’ and other things at Undercurrent in NYC. He did some ‘data science’ on rappers and vocabulary. Whatever. I do not care which rapper invents more variations on the word ‘hoe’. But then he had to add data points for Shakespeare and Melville.

Now, I am not exactly a fan of Moby Dick, but Shakespeare and Melville should never share the same study with people who call themselves GHOSTFACEKILLAH or Ol Dirty Bastard.

shakeseminemmelville

One these things is not like the others…

There are more than a few quibbles to be had with this particular study. Melville, to his advantage, was writing prose (although he uses many poetic devices throughout his writing), thereby giving him a distinct advantage in word choice over both Shakespeare and these other ‘artists’. On top of this, the hip hop gentlemen rappers have had their FIRST 35,000 words analyzed, while Melville gets to count the first 35,000 words in what is (wrongly) acclaimed as his masterpiece. In comparison to Melville or the rappers, Shakespeare’s 35,000 seems to be a random selection (the first 5,000 words from 7 plays). How does the ‘data scientist’ account for any plateau effects? Is it possible that rappers use less (or more) new words later in their career?

The language is also troubling. Shakespeare is inventing new words like besmirch and lackluster, whereas ‘Outkast’ gets credit for words like nahmsayin and flawsky-wawsky. These are not the same thing. Not only that, but Daniels credits a rapper with four words for using the following: ‘pimps, pimp, pimping, and pimpin’. All of this does not take into account the poetic devices worked with much toil into Shakespeare and Melville. Shakespeare ‘one-ups’ everyone considered here, using iambic pentameter throughout his plays.

Daniels admits his study does not even calculate the words of the rappers accurately, but insists “It’s still directionally (emphasis his) interesting.” I am pretty sure that word does not mean what you think it means.

Finally, consider one additional item in regard to Shakespeare. He is credited with writing 884,647 words in his career. Even at 35,000 words, this study is a very small sample size.

Oh, by the way – even in this skewed format and meaningless study, Shakespeare and Melville do pretty well.

Give Us Some Curmudgeonly Wisdom...

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Social Media

%d bloggers like this: