As the Corona virus propagated very quickly through the world, the topics that make up the key talking points about the Coronavirus changed over time. Wuhan was the initial epicentre of the outbreak. This later became Italy, France, Spain, The United Kingdom and then the United States. Being able to disentangle and track how topics changed over time provides more powerful insight into activity on Twitter and changes in focus as world events change. To do this we use Dynamic Topic Modelling, an extension to Latent Dirichlet Allocation, where documents are grouped into time periods and this time dimension influences topics.
This allowing us to see how topics change overtime. This change consists of differing popularity of a topic over time, but also how a topic shifts over time. For example, the word ‘risk’ in the context of the Coronavirus is not the same as the word ‘risk’ in the context of the gambling. As such, the term usage evolves in a topic over time. It’s also possible that a topic gradually changes it’s meaning over time too as a consequence as word usage changes with the changing local and global situation.
Dynamic topic modelling is a good tool for data exploration with an obvious temporal dimension. That is, when you don’t know the breadth of content in texts and how they change over time and you want a high level view of what the texts contain and how temporal aspects affect this. With this in mind, our aim is to understand the fundamental topics discussed on twitter in The United Kingdom, France and Italy over time.
** Dynamic topic modelling for France is yet to be conducted **
In preparing the tweet data I took the following steps:
There are several important caveats when considering this data:
Here are the most significant topics from the UK tweets about the coronavirus. For each topic, the most significant terms are shown in subfigures. The titles of topics are manually annotated.
NHS Staff
Investigating the ‘NHS Staff’ topic we can see that the topic initially starts with terms ‘health’ and ‘system’ alongside other terms. This shifts from the NHS as an institution to care workers and staff over time perhaps indicating the focus on the human effort and sacrifice made by people working for the NHS.
Death Toll
Investigating the ‘Death Toll’ topic more closely, we can see that the focus was on the number of cases per day initially. This then shifted to the number of deaths and was closely tied to Italy for a time when this was the epicentre of the coronavirus outbreak.
Testing
Finally, investigating the ‘Testing’ topic, we can see that there are gradually more mentions of PPE (personal protective equipment) and ventilators over time.
Here are the most significant topics from Italian tweets about the corona virus. For each topic, the most significant terms are shown beneath. The titles of topics are manually annotated. Not all topics have an obvious title and so don’t take the title as a definitive label. It simply serves to gain a better understanding of the data.
Countries and institutions
Investigating the ‘Countries and institutions’ topic we can see that the topic initially starts with a focus on the usa, europe and the economy. These are likely general topics discussed in Italy before the outbreak. This topic becomes less about the USA and the economy over time, focusing more on Europe and the European Stability Mechanism (Meccanismo europeo di stabilità) and European Central Bank (Banca Centrale Europea). It is clear that discussion about how the European Union will respond to the crisis started to become a dominant topic.
Italy & Measures
Investigating the ‘Italy & Measures’ topic more closely, we can see that the overall focus is Italy. Other than this, the topic changes a lot over time.
Spread & Measures
Investigating the ‘Spread & Measures’ topic more closely, we can see:
Regions
Finally, investigating the ‘Regions’ topic. We can observe that:
Dynamic topic modelling has allowed us to identify how topics and topic word composition change over time as the local and global situation of the Coronavirus changed. Sometimes the shift of topic dominance and topic term composition is quite rapid as the global context changed. The most obvious case of this is the changing epicentres of the disease from Wuhan to Lombardy, Italy as a whole and France, the United Kingdom and Spain.
Italy and the United Kingdom have many similar topics surrounding the Coronavirus. However, there are differences. Perhaps the most obvious differences are: