Part II: How has sentiment towards Victoria’s lock-down changed over time?

MajorlyUnemployedGrad
4 min readOct 25, 2020

This post is an update to the article here:

If you haven’t read it, it may be of use to get some context for this project. Here, I tracked the overall sentiment of comments in the daily r/melbourne Covid discussion threads from the 27th of September to the 23rd of October to get a sense of how feelings had changed as life under lock down continued.

After posting my results, many of you on Reddit informed me that it would be more fruitful to look further back to see how sentiment had evolved over a longer period of time. This is exactly what I have done; I have replicated the analysis performed in yesterday’s post but with comment data from r/melbourne Covid threads beginning on the 5th of July.

The method of extracting the comments was slightly different to that used in the previous analysis (again, I’ll post the Python script used — I think it’s a bit more elegant this time).

This time, I looped through all the relevant discussion threads, and then saved each top level comment, along with its karma, polarity score and date of posting. Going through all the threads and saving all the comments took over 12 hours due to the request limit with Reddit’s API. However, once it was done I had a list of 29,493 top level comments from r/melbourne Covid threads from July to the present, with their associated karma, posting date and polarity score (as computed using the NLTK VADER library for sentiment analysis).

I then grouped all these comments by their date of posting and computed the overall sentiment metrics for comments on each date. This left me with a data set with entries for 112 different dates from July to the current (5th of July — 24th of October), with the scores for the different overall sentiment measures for all comments made in Covid threads on that day. From this data, I plotted how these measures of sentiment changed over time.

If you didn’t read the previous article, here’s a brief description of each overall sentiment metric:

  • Mean polarity: as the name implies this is simply the average polarity score over the comments posted on a given day.
  • Proportion charged: the proportion of all comments that have a polarity NOT equal to zero (i.e. just those comments that are positive or negative), that are positive.
  • Proportion total: the same as the above but as a proportion of all comments, even those that are neutral.
  • Beta: for the set of comments belonging to a given day, I compute a simple linear regression of karma on sentiment. i.e. I estimate how increasing the positivity of a comment is associated with, on average, that comments karma. I divide this coefficient by 10. As such this value represents the average change in karma associated with polarity being increased by 0.1.

Here’s the Python script if you’re interested:

One last thing before getting into the results. Many of you have also pointed out potential bias in the results/sample due to the moderation of the subreddit and the general unrepresentative of the subreddit with respect to the general population of Melbourne/Victoria. I think these results can, to a degree, speak to broader sentiments in the city if we accept a number of assumptions. However, I want the focus of this blog to be on data science/coding projects rather than applied sociology. As such, I will not defend these assumptions or posit that the results represent anything other than the sentiments of r/melbourne users over time (acknowledging that moderation may skew these results).

I am posting these results only to track this specific variable and saying nothing about how these trends may reflect broader trends — I will leave discussion of how these results may or may not indicate anything more than what they display on the surface the reader,

Of course, the same caveats re: interpreting the results as support for Andrews apply too.

Results!

Like last time, I’ll let the graphs speak for themselves.

Here’s a table of results:

(It probably would’ve been best to exclude the results from the 24th of October from the graphs since the comments scrapped for this entry are only those posted on that date in a previous date’s discussion thread and are not representative of all comments made on that date — but it doesn’t really matter.)

The correlations between the different metrics are comparable to how they were in the previous analysis that looked just at comments made between the 27th of September and the 23rd of October.

For an interpretation of these correlative results see the original thread.

Hope you’ve enjoyed! Thanks for reading.

--

--