Electorate Demographics and Labor Votes

MajorlyUnemployedGrad
12 min readNov 7, 2020

This article is devoted to examining the relationship between various demographic features of an electorate (as recorded in the ‘Discover Your Commonwealth Electoral Division dataset published by the ABS for 2019) and the electorate’s TPP vote share for the Australian Labor Party (ALP) in the 2019 election.

Through a few simple graphs we get a sense of which demographic factors are associated with ALP votes, and exactly what these relationship look like. I go through a range of factors:

  • Income
  • Age
  • Engagement
  • Job type
  • Educational attainment
  • Housing tenure
  • Rent and mortgage repayments
  • Family status
  • Ethnic/cultural background

I’ll be going through the data, presenting it primarily via various scatter plots but I’ll also be doing a bit of ‘story telling’; discussing how well the preliminary data appears to cohere with common wisdom about the voting behavior of various demographics. This analysis is NOT rigorous and is really just me thinking out loud so take it with many grains of salt. I do hope, however, that these comments stimulate discussion about the patterns we see here. When I see a commonly asserted relationship to hold I make note of it, when I see it contradicted I try to offer an explanation or offer alternate hypotheses. That said, all that is concrete is the data displayed in the graphs — nothing beyond what is explicitly shown there should be inferred without further investigation.

With this in mind I want to make an obvious but important point before diving into the results. As you have all likely heard a thousand times: correlation is not causation. Though the associations we see here are fascinating and likely tell some causal story about the relationship of demographic features and voting behavior, we cannot take them at face value. Consider, for the sake of example, how electorates with lots of speakers of a language other than english (LOE) tend to vote Labor more than electorates without many LOE speakers.

It might be tempting to conclude that LOE speakers tend to vote Labor — and indeed there are a number of strong causal stories you could tell to convince yourself of this fact. However, consider that A) electorates with many LOE speakers tend to rent more than electorates where many people rent and that B) electorates where renting is prevalent tend to vote Labor. With this in mind, it could simply be the case that, when we keep the proportion of renters constant, the proportion of LOE speakers has no relationship with the electorates preference for Labor. Thus, the LOE proportion of the electorate does not really impact the Labor vote share, but simply picks up the impact of the prevalence of renting on the Labor vote share.

Whether or not the above situation is actually true I cannot say. There are statistical tools we can use to determine if this is the case or not (or rather, provide evidence one way or the other) and to try and establish causal effects. However proper use of these tools is a bit more complex, would require more data than I have to produce meaningful results and beyond the scope of this article (though I may write on it later).

We should also be careful to avoid the ecological fallacy: statistical associations on an aggregate (i.e. electorate) level do not necessarily translate to statistical associations on an individual level. That is to say, just because electorates with low levels of average income are less likely to vote Labor does not mean that, on an individual level, low income earners are less likely to vote Labor. In fact, the complete opposite could be (and, according to conventional political wisdom, is) true.

The wikipedia article (which you can find here which explains this phenomenon in greater detail) notes how, in US presidential elections, Republicans tend to win poorer states and Democrats tend to win richer states even though the poorer an individual voter is the more likely they are to vote Democrat. I’ll go into the reasoning behind this a bit later in the article but for now it will suffice to say that a general awareness of this phenomena is necessary before viewing the results.

Lastly, the smoothed lines on the scatterplots are the result of LOESS regressions for the relevant variables. I’ve included this bc it highlights the non-linearity of certain trends. However, a lot of the time it seems to really ‘overfit’ the true, seemingly linear relationship. With this in mind make sure to observe the actual relationship of the plots as indicated by the various dots not just the LOESS fit line.

(These graphs were made in R, the code is here if you’re interested, scroll past otherwise:)

Results!

Firstly, let’s take a look at how the median income of an electorate is associated with said electorate’s ALP vote share:

ALP_P refers to the TPP vote share received by the ALP in the relevant electorate. Some of the names for the independent variables (x-axis) are a bit weird. If it’s unclear what a variable name refers to I’ll spell it out.

This is an interesting one. There’s a lot of noise in the data here, suggesting — to no one’s great surprise — that factors other than average income determine how an electorate votes, however, there is a bit of an inverse U-shape visible. If we grossly simplify, we may note that poor electorates tend not to vote Labor, slightly less poor electorates do, and then, the richer you go the less Labor voters you get.

This does not seem to deviate too much from common wisdom. Beyond a certain point, the general trend is that higher income implies less Labor votes. This would typically be explained by the fact that higher income voters tend to vote Liberal since Liberal policies tend to favor the preservation of wealth for the wealthy. Thus, the greater the median income, the greater the amount of Liberal voters (I am aware that this hypothesis could easily be challenged on the basis of the ecological fallacy I mentioned earlier but I think it’s a decent argument.)

We can get a better insight into the relation between electorate median income and voting practice through a clustering analysis. A clustering analysis will sort the electorates into 5 different groups on the basis of how similar they are to each other in terms of median income and ALP vote share.

Note: in order to cluster effectively we convert the ALP_P and median_income variables into their standardized z-score so that the mean of both variables is 0 and the standard deviation of both is 1. Since the mean ALP share is is < 50% (48%), 0 on the y-axis doesn’t exactly correspond to the threshold for Labor ‘winning’ on a TPP basis but it is close enough.

Here we see that dividing our electorates up into 5 groups works quite well. We have:

  • The green group: poor electorates that don’t like Labor. A quick glance shows that many of these electorates seem to be quite rural and thus perhaps tend to vote Nationals.
  • The (ironically colored) blue group: poor electorates that like Labor. Again, a cursory glance shows that many of these electorates are urban and suburban areas — indicating that they may represent Labor’s traditional support base (urban working class).
  • The red group: electorates of average socio-economic status that are split between the ALP and LNP on a TPP basis.
  • The purple group: reasonably rich electorates that vote Labor. I have a suspicion that these electorates consist of young, educated, professionals.
  • Finally the gold-ish/yellow group: rich people who vote Liberal.

It would be fascinating to dive a little deeper into these different groups, i.e. to see what else separates the red and blue groups (average age? ethnic composition?) or how education varies between clusters. This will have to be a project for another time.

Let’s move on to the relationship between age demographics and Labor votes:

One thing to note here is that, in renaming the variable names, I stupidly called them ‘uptoXX’. This isn’t really accurate. ‘uptoXX’ really means the proportion of the population BETWEEN age XX and the previous age category listed. For example, ‘upto49’ refers to the proportion of the electorate who are between 34 and 49 years old.

Yet again, this data doesn’t deviate from the stories of conventional wisdom: young people are (small-l) liberal and prefer Labor, older people are conservative and vote (big-l) Liberal. As such, more young people tends to mean more Labor votes in an electorate.

Now let’s take a look at ‘engagement.’ A person is fully ‘engaged’ if they are studying or working full time. As such, this sort of proxies for unemployment.

For details of various levels of engagement see: https://www.abs.gov.au/ausstats/abs@.nsf/Lookup/2901.0Chapter31202016

Here the coherence with conventional wisdom is less obvious. Conventional wisdom tells us that electorates with lots of unemployed people will favor political parties with more socialist policies who will provide them with greater levels of unemployment benefits. The data, however, illustrates a very dubious association between the proportion of an electorate that is ‘not engaged’ and the electorates propensity to vote Labor. Furthermore, low levels of full engagement seem to be associated with low levels of Labor support.

One possible story here is that engagement doesn’t differentiate between the proportion of the electorate in education or training and the proportion of the electorate that is employed. The story goes like this:

  1. Electorates with more university students (larger proportion of electorate in education) tend to support Labor more since students tend to be liberal.
  2. Electorates with high unemployment tend to support Labor for the reasons discussed.
  3. Students tend to come from wealthy electorates which likely feature low unemployment).
  4. As such, low engagement electorates might have the opposing forces of high unemployment (leading to more Labor votes) but small engagement in education (leading to less Labor votes), thus inhibiting the emergence of a clear pattern.

Again, I want to stress these ‘hypotheses’ are not serious arguments I am putting forward, they are simply suggestions that come off the top of my head designed to stimulate your own thoughts about the processes underlying the data we see and avenues for further investigation. This particular hypothesis could be expanded and tested if I had more data so that there was enough variance between different electorates to successfully control for the impact of confounding variables. Regardless, the focus of this article is the data itself; the graphs. Take only what is explicitly shown in these as fact.

With that in mind, let’s now take a look at how the prevalence of different types of jobs impacts the support for Labor:

For details on what jobs are subsumed under each category here consult the ANZSCO: Australian and New Zealand Standard Classification of Occupations.

There are some clear cut associations here. More managers = less Labor votes. More community workers = more Labor votes. No surprises here. What is, perhaps, more surprising is the lack of clear cut relationships between the proportion of an electorate employed in working class occupations and support for Labor — as demonstrated by machinery workers.

Perhaps, the ecological fallacy can explain the absence of the expected relationship. Suppose that, on an individual level, machinery workers really do tend to vote Labor more than the general public. Let’s also assume that machinery workers tend to live in areas with low median incomes. Now, if we also assume that many voters in low-income areas tend to vote Liberal/National (you could think of several arguments for this) then it would make sense that electorates with lots of machinery workers don’t seem to favor Labor — even if machinery workers themselves tend to vote Labor.

This could explain why we don’t see the trend asserted by common wisdom on the macro level. Then again, the problem could be more serious — perhaps the ALP has become disconnected from its working class origins.

We should also note that the possibility of ecological fallacy holds with respect to other professions (maybe managers don’t really vote Liberal but just happen to live in areas where lots of other people do).

Median income and proportion of machinery workers — just for fun

Now for education:

year12 refers to the proportion of the electorate who have finished year 12 of high school.

We see that the greater the proportion of the electorate that has completed year 12, the greater the ALP’s vote share in said electorate.

It would be interesting to see how much variation in educational attainment impacts vote share once other, likely correlated, factors (i.e. unemployment, income) have been accounted for (i.e. does educational attainment really impact ALP vote share on the electorate level or is it simply the influence of a confounding variable.)

Interestingly, when we regress ALP vote share on ‘year12’ and ‘median_income’ (standard OLS linear regression), it is the coefficient on year12 that is statistically significant whilst that on median_income is quite close to zero. What this means is that, when we hold educational attainment constant, variation in median_income does not translate to variation in ALP vote share at the electorate level.

Adding engagement to the list of predictors leads to a model with an unacceptable degree of multicollinearity so I won’t discuss those results.

I do suspect though that these results would be a bit different if we had more data so that OLS could better distinguish between the effects of different, correlated demographic features (and it is for this reason that this article is very light on statistical analysis).

Housing tenure:

owned_mortgage refers to the proportion of the electorate that owns a house with a mortgage, owned_outright the proportion of the electorate that owns their home and I’ll let you guess what rented refers to.

These results are all very interesting and aligned with common wisdom. Electorates with lots of renters tend to vote Labor — perhaps reflecting the socio-economic status of voters who live in areas where housing is largely rented. Home owners tend to vote Liberal — there are many possible strong explanations for this — I won’t get into them for the sake of brevity.

Housing tenure metrics:

These values refer to the median (weekly) rent and (monthly) mortgage repayments paid by renters and mortgagors in the relevant electorate.

Higher median rent leads to higher ALP vote share. It would be interesting to see what this relationship would look like once median income was controlled for (since those with higher income probably rent more expensive accommodation).

Median rent and median income for an electorate.

Adjusting for median income, a linear regression model of median rent on ALP vote share changes the rent coefficient’s p-value from slightly above 10% to around 6.6% (i.e. the probability that median rent really does effect vote share becomes larger). This gives us some evidence that — holding income constant — areas with higher rent tend to vote Labor.

Family status:

Proportions of electorate with the described family type.

Again, all this is quite interesting and there are many stories we could tell but I wont dive into these results for the sake of brevity.

Ethnic/cultural background:

Born overseas refers to the proportion of the electorate that were — shockingly — born overseas. LOE refers to the proportion of the electorate that speak a language other than English at home.

Some nice, clear relationships here. We can quite reasonably think of LOE as a proxy for the prevalence of minority ethnic or cultural groups. Born overseas is a bit different (though still correlated of course) since many people immigrate from English speaking countries.

Something encouraging here is that the relationship between LOE and ALP vote share is positive and linear. Our keen awareness of ecological reasoning could suggest to us the hypothesis that, though on an individual level LOE speakers tend to vote Labor, electorates with large immigrant/ethnic populations may feature anti-immigrant backlash in the form of voting Liberal/National. It is encouraging to see that this is not the case, the ‘group threat’ theory of racial animus doesn’t seem to hold here (or at least does not hold to a sufficiently strong degree to influence voter behavior).

I hope that the trends displayed here have been informative and interesting.

Thanks for reading!

--

--