Archive - May 2015

1
New Short Story, New Anthology of Essays
2
Statistics of Gender on the Hugo Writing Nominees: Probabilities and Standard Deviations
3
Los Angeles Seems to be Lacking Squares
4
Fleeting Thoughts

New Short Story, New Anthology of Essays

I’ve been traveling without a laptop and have been terribly remiss on updating the Interwebz!  (Still traveling, but laptop is back so hopefully I can catch up a little.)  Anyway, I HAVE A NEW SHORT STORY!  And my essay is in a NEW CRITICAL ANTHOLOGY!

  • “By Degrees and Dilatory Time” was published last week at Strange Horizons, along with beautiful illustrations (eek!) and an audio version (eek again!).  It’s a story that’s very personal to me and one I’m very proud of.  Many, many thanks to my generous betas for making the story better, the excellent editors at Strange Horizons for making it even more better, and the incredible illustrator and narrator.  You all improved this story by a factor of a thousand!
  • I’m very proud to say my essay “Nobody’s Sidekick: Intersectionality in Protagonists” is part of the anthology of essays on representation Invisible 2, edited by Jim C. Hines.  I hopped on the Amazon page on release day (was going to try to tweet it but failed at phone copy/paste), and we hit at least #2 in the SFF History & Criticism category.  Whee!  (Also, all proceeds go to Carl Brandon Society to Con or Bust.)

Check ’em out!

Statistics of Gender on the Hugo Writing Nominees: Probabilities and Standard Deviations

I’ve been trying to stay out of saying anything about the Hugos Awards, mostly because lots of people are saying lots of things already and I haven’t felt like I have anything to add.  But then Jim Hines posted today speculating about MATH, and, well, I got nerd-sniped.

Here’s the original (long, long) comment I left on his blog.  I finally decided I couldn’t not do a normal distribution and standard dev, so I came back here for it, but the numbers in the original comment might be a bit more intuitive for non-math folk than what I’m going to do here.

Motivation

The Hugo Awards are a SFF award nominated by popular vote.  There is some controversy (understatement) about the nominations this year.  I’m not going to get into that here, just going to display some numbers.

It would, however, be disingenuous not to state my own bias, which is that I think institutional discrimination against women and people off the gender binary exists and is a problem.  I’ve allowed that bias to affect how I frame my wording (and I’ve editorialized at times), but I’ve performed the math exactly as I believe is correct.  Since it’s very possible to make statistics seem skewed toward a particular viewpoint by bad-faith numerical sleight of hand, I want to state up front that I have not done so here — any poor mathematics or misunderstanding of confidence levels is due to (1) my lack of background in stats or (2) genuine error.

What I’m doing, and what it means

The four writing categories for the Hugo Awards have 5 nomination slots each, for a total of 20 nominations for fiction writing.  I’m going to make the probability distribution for the likelihood of a particular gender split (e.g., find the probabilities of a 10/10 split, or a 9/11 split, or a 15/5 split, etc).  This will approximate a nice normal distribution.  If you don’t know what that is, that’s okay — the important part is the next bit.

Once I have the probability distribution, I’m going to take the standard deviation.  Standard deviation is a very useful statistical tool that tells us the likelihood something will be in a given range of numbers.  For example, it’s not terribly useful to look at the probability of a exact 8/12 split — it’s more useful to look at the probability the gender split will be within a certain range of numbers.

For a normal distribution, 68% of the data will fall within 1 standard deviation of the mean (the mean = the average), 95% will fall within 2 standard deviations of the mean, and almost 100% will fall within 3 standard deviations of the mean (99.7%).  Once we get out to three standard deviations from the mean, we’re talking about extreme outliers.

This will tell us whether a given gender distribution is within what we’d consider an expected year-by-year fluctuation from 50/50, or whether, assuming a 50/50 gender split, it would be…well, an extreme outlier.

Caveats
  • I’m a mathematician but NOT a statistician; I’ve never actually studied stats.  I only know enough basics to get me in trouble.  If you know more stats than I do, please jump in!
  • I’m considering gender to be 50/50 split on a male/female binary because I couldn’t quickly find stats on nonbinary folk.  (Sorry!!)
  • I’m the type of mathematician who hasn’t worked with numbers in so long that I’m very prone to arithmetic mistakes.  If you find any, please shout.
Tools
The Data

I’m keeping it easy: 20 nomination slots, 50% probability of a given gender getting a nomination.[1]

I haven’t talked about much specific Hugo data here, but when I have I’ve pulled it from the graph in Jim Hines’ post.

Binomial Probability and the Frequency Distribution

Binomial probability gives us the following distribution — conveniently, the calculator above gave it to me all in one go when I entered n=20 (20 nomination slots) and p=.5 (50% probability of male or female).  The following table is copy/pasted verbatim from the results.  For non-math people, note that we’re not calling a male person or a female person in a nomination slot a “success” or a “failure” in the semantic sense — here “success” and “failure” are neutral probability terms.

Binomial, Poisson and Gaussian distributions

Number of trials (or subjects) per experiment: 20
Probability of “success” in each trial or subject: 0.500

Number of
Successes
Number of
Failures
Exact
Probability
Cumulative
Probability
0 20 0.000% 0.000%
1 19 0.002% 0.002%
2 18 0.018% 0.020%
3 17 0.109% 0.129%
4 16 0.462% 0.591%
5 15 1.479% 2.069%
6 14 3.696% 5.766%
7 13 7.393% 13.159%
8 12 12.013% 25.172%
9 11 16.018% 41.190%
10 10 17.620% 58.810%
11 9 16.018% 74.828%
12 8 12.013% 86.841%
13 7 7.393% 94.234%
14 6 3.696% 97.931%
15 5 1.479% 99.409%
16 4 0.462% 99.871%
17 3 0.109% 99.980%
18 2 0.018% 99.998%
19 1 0.002% 100.000%
20 0 0.000% 100.000%

 

Cool!  This gives us a frequency distribution.

The Normal Distribution

Binomial probability (what we just used to get the frequency distribution in the above table) with p=.5 and a reasonable number of data points is known to approximate a normal distribution, aka a bell curve.  Here’s a normal distribution via Wolfram Alpha of these data:

http://www.wolframalpha.com/share/img?i=d41d8cd98f00b204e9800998ecf8427ene3s1qcrn&f=HBQTQYZYGY4TMNDEGQYTKZJQMYYDCNDEGQYWGMRTMRTDSMBQGE2Aaaaa

Notice that it’s centered around the mean (average) of 10, as we would expect.  We’ve got the number of nominees of a given gender on the x-axis (it doesn’t matter which gender we choose, as it’s symmetric — we could say the x-axis is the number of male nominees or we could say it’s the number of female nominees), and the percent probability we’ll land on that number of nominees on the y-axis.

Whether we look at the table or the graph, we’re hitting about a 17-18% probability of an even 10/10 split, and it drops off quickly on either side, until a 0/20 split in either direction has almost a 0% probability.

Standard deviation

(I actually found the standard dev first and used that to graph the normal curve, but shhh!  I think it’ll make more sense to non-math people to write it in this order.)

One reason it’s so lovely to talk about standard deviations in a normal distribution is it gives us very pretty ranges that other people who know basic stats can easily grasp — if you say “more than a standard deviation from the mean,” people who know what standard deviation is will have an idea of how hefty a divergence that is.  Here’s a great visualization for standard deviation on a normal distribution:

Standard deviation diagram

By Mwtoews [CC BY 2.5 (http://creativecommons.org/licenses/by/2.5)], via Wikimedia Commons

As you can see, the dark blue is within 1 standard deviation of the mean and takes up 68.2% of the data.  The lighter blue shows going out another standard deviation from the mean, and the even lighter blue goes out to a third standard deviation from the mean, where the probability of landing is very close to zero.

Standard deviation has a complicated formula that’s beyond the scope of this post — I just used my calculator.  The standard deviation for these data is about 2.236.

For a normal distribution, that means 68% of the data fall within 2.236 of the mean.  In other words, 68% of the data fall within a difference of 2.236 from 10, or between 7.764 and 12.236.

It’s easy to check that this is about right: if we go to our table above and add the “exact probability” column for 8, 9, 10, 11, and 12, we get a bit above 70%.  It’s not exact because our frequency distribution is only approximating the normal distribution, but it’s a very good approximation, and it’s generally considered an appropriate model for binomial distributions with non-extreme probabilities and a reasonable number of trials.[2]

One Standard Deviation, Two!  Three Standard Deviations, More!

Remember that about 68% of the data will fall within 1 standard deviation of the mean, 95% will fall within 2, and 99.7% will fall within 3.  In other words, another advantage of standard deviation is that it gives us some nice arithmetical shortcuts, as follows:[3]

  • 1 standard deviation:  7.764 – 12.236
  • OR: About 68% of the time, the gender split will be 8/12 or closer.
  • 2 standard deviations: 5.528 – 14.472
  • OR: About 95% of the time, the gender split will be 6/14 or closer.
  • 3 standard deviations: 3.292 – 16.708
  • OR: About 99.7% of the time, the gender split will be 4/16 or closer.

And finally:

  • A gender split wider than 4/16 is an extreme outlier.[4]

Note that though a split wider than 4/16 suggests something very statistically unlikely is going on, it does not say why, and it does not assign intent.  My lived experience suggests that intentional sexism should not generally be assumed when systemic bias will suffice, and in a process like writing, publishing, publicity, and awards nominations, there are plenty of stages at which institutional bias can manifest itself.  This does not, of course, mean there is not a problem — in fact, it would mean the problem may be one that requires more thought, awareness, and effort to address.

I’ll further note that if you consider the years 2010-2014 (none of which had fewer than 7 nominations for either gender) and compare them to 2015,[5] and this leads you to conclude (along with a preponderance of other data, I am aware) that something untoward happened in 2015, I’ll further note that even one person or one small group of people with a particular subgenre taste having chosen a fantastically statistically unlikely slant of genders still does not imply malicious sexism.[6]  What it does imply, in my opinion, is a variety of other extremely upsetting problems, exacerbated by the fact that nonmalicious sexism can be much, much harder to combat.

So.  What was the gender split in the writing categories is this year?

  • 3/17.

 

 

Footnotes    (↵ returns to text)
  1. Yes, I’m aware there are factors affecting that 50/50 probability, even in years that aren’t this one — potentially factors at every step in the publishing process, not just the nominating-for-awards stage.  This post could be, in that vein, viewed like a proof by contradiction — I’m showing the probabilities of expected fluctuations, and if you’re seeing greater extremes, that might indicate the starting assumption of 50/50 gender blindness at all steps is, in fact, incorrect.
  2. This distribution definitely has a non-extreme p — I tried to figure out if 2o trials is a reasonable number for approximating via a normal distribution and didn’t get anything definitive, although I did compare by hand and the numbers all seemed pretty close.  But if you distrust the model, notice that I’m really only using this one to make relatable statements about the exact raw data that you can look at in the table above — if you want to, you can define your own terms to look at probability ranges by adding the numbers in the third column, and you’ll come to the same conclusions.  In other words, about 70% of the data fall between 8 and 12 whether we use the vocabulary “within one standard deviation of the mean on a normal distribution” or not.
  3. You could, again, find the exact percentages by adding the numbers in the table.  But this is faster.
  4. As far as I know “outlier” doesn’t have a specific statistical definition, but I’ve seen it used to mean “three or more standard deviations from the mean,” so that’s what I’m doing here.
  5. If you do compare, be aware that some of those years had greater or fewer than 20 nominations — presumably because of ties or the 5 percent rule — and I’ve not accounted for those sorts of variations here.  The ideas should be broadly applicable, however, and if we’re speaking roughly, I’ll note that 4 out of the 5 years from 2010-2014 had at least 8 nominations from both genders, and the other year had a 7/11 split, which is perfectly in line with the numbers above: if 4/5 years fall within the 68% (roughly) and 1/5 falls outside the 68% but within the 95%, that’s about what we’d statistically expect.
  6. Well, at least one person involved has nonfiction writings that would support such a conclusion, but I will not extend his philosophies to the rest.

Los Angeles Seems to be Lacking Squares

Pershing Square-2

By Visitor7 (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

I realized something the other day: LA doesn’t have squares.

I could not think of a single square in LA.  In contrast, I could think of six squares in Boston off the top of my head (Kendall Square, Harvard Square, Central Square, Porter Square, Inman Square, Davis Square).  And actually, technically none of those are even in Boston proper, but are in Cambridge and Somerville areas in the very narrow environs I used to frequent.  Boston probably has even more squares.

I could also think of squares in pretty much every other American city or town I’d lived in, even the small ones.

My friend hypothesized that LA’s lack of squares might have to do with the geography here—LA roads work around hills and valleys and freeways and often lack well-laid-out right angles—and that maybe the lack of places that fit the geometry of a square resulted in us not using the word “square,” even for places which function like squares.  I wondered if it could also have something to do with the sprawl, if LA is so wide and scattered that no place seems central enough to a neighborhood to deserve to be called a square.  Or maybe LA lacks the drive for community the other cities work for, and therefore urban planning has not included as many squares.

Hmm.  It fascinates me, the way we use language.  The different names we call things.

Well, I did just do a Google search and it turns out LA does have a few squares, including Pershing Square which I knew of but had forgotten about.  Still, considering what a huge city this is, LA doesn’t seem to like squares all that much.  I’m tempted to do a square-per-square-mile or square-per-capita study just to see if I’m right about this . . .

(And now I feel this post is getting decidedly silly.  What can I say, sometimes I wonder about things!)

Fleeting Thoughts

Window_peek_at_rainy_time,_wayanad

In waiting for the sun to come to just the right angle I have missed the moment entirely.

Photo: By yjenith (Own work) [CC BY-SA 3.0], via Wikimedia Commons

Copyright © 2014. Created by Meks. Powered by WordPress.

%d bloggers like this: