Energy Mobility Tweets Searches

Identifying the Main Topics

Michael Anslow
SONY Computer Science Lab - Paris

When investigating a large amount of data it’s difficult to know where to start looking. Even though the coronvavirus has affected practically everyone in the world in some way, we all experience it differently. One principled way of trying to identify what the fundamental topics are on twitter is called topic modelling.

A topic is a collection of words that frequently occur together. For example, “NHS”, “worker”, “staff” could indicates a topic about people working for the NHS (the National Health Service (NHS) in the UK). One of the most popular approaches to topic modelling is Latent Dirichlet allocation. The technical details of this are beyond the scope of this report but essentially it is a machine learning algorithm that tries to learn how to compose a document as a set of topics which in turn are composed of a set of words (like the “NHS”, “staff”, “worker” from before).

To give a better intuition of what topics are, consider the following analogy with an artist painting. A canvas is a tweet, each topic is a colour and all the topics are a colour palette. Each canvas (Tweet) is painted with one or more colours (topics). A particular painting is then a mixture of colours in the same way that a Tweet is a mixture of topics.

Purpose of Analysis

Topic modelling is a good tool for data exploration. That is, when you don’t know the breadth of work contained in some texts and you want a high level view of what the texts contain. It can provide a set of fundamental topics that the texts (tweets in this case) address. With this in mind, our aim is to understand the fundamental topics discussed on twitter in The United Kingdom, France and Italy.

Caveats and Data Preparation

There are several important caveats when considering this data:

In preparing the tweet data I took the following steps:

Visualisations

We can visualise topic models using a wonderful interactive package called pyldavis. This provides a way to visualise:

  1. How common topics are across documents by the size of circles.
  2. How topics overlap corresponding to overlapping circles.
  3. Composition of a topic by clicking on a topic and observing the most probable words for that topic.
  4. How common words in topics are distributed over topics, by hovering on a word on the right and observing how topics sizes change on the left.

Alongside the pyldavis representation are scatter plots of 10,000 tweets. This visualises:

  1. Tweet words used in the topic model.
  2. Topics and their most probable words on the right. Note that as there are many topics, some colours repeat and so to identify a particular topic, either double click on it to hide tweets corresponding to other topics, or click on it to make it disappear and click again to make it appear to identify where documents belonging to it are.
  3. The dominant topics of a tweet based on the average colour of it’s dominant topics shown in the legend. The dominant topics are also shown when hovering on a point.

In the following, we explore topic models for the United Kingdom, France and Italy to identify differences and similarities.

The United Kingdom

Here are the visualisation for the topic model and corresponding document embeddings for the United Kingdom.

Topic Model

Document Embeddings

Key Topics

Topic: 2. Death Toll.
Topic: 4. Stay safe mixed with mentions of the Independent newspaper.
Topic: 5. 6. 18. 16. All seem to be different aspects of the lock down.
Topic: 7. Death rates.
Topic: 8. Different countries and need for ppe (personal protective equipment).
Topic: 9. Panic buying.
Topic: 12. The NHS.
Topic: 12. Business Markets.
Topic: 14. Ventilators.
Topic: 15. Easter Weekend lock down.
Topic: 17. Advice, information, support. Matt Hancock the Secretary of State for Health and Social Care is included in this topic. He was largely responsible for representing the government in daily briefings to the public.
Topic: 21. Wages.
Topic: 22. Supply chain.

People

Here is a summary of the key people mentioned in the topic model.

Person Role

Boris Johnson
Prime Minister of the United Kingdom

Matt Hancock
Secretary of State for Health and Social Care

Summary

This is reassuring as it captures the breadth of topics one would expect about the Coronavirus outbreak. Though perhaps the topics of the lockdown should be merged into 1 or two topics.

Some topics related to world leaders like Donald Trump (topic 19) and Boris Johnson (topic 24) that are important figure heads concerning decision making about managing the coronavirus.

Some topics are difficult to decipher, for example, topic 11 is both about the topic of herd immunity and the wuhan animal markets. It is understandable why these topics might be mixed up as animals can be in herds but perhaps the topic model should be finer grained on this point. While topic 11 is inappropriately mixed in with words related to football.

There are also very specific topics related to particular events like:

Topic 1. Outbreak in Hubei province. The topic seems to be mixed in with prison and prisoners. Perhaps taken from tweets like:

Hubei Daily: 271 prisoners in Hubei are infected with #covid19 among which 230 are in Wuhan Women’s Prison and 41 are in Shayang Hanjin Prison.
Head of the women’s prison has already been sacked and an officer in Hanjin was given serious warning for omitting contact history. pic.twitter.com/bs1BuVQ7uE

— Xinqi Su 蘇昕琪 (@XinqiSu) February 21, 2020

Topic 3. This could be a topic relating to the ethnic Uyghurs muslim group in China. For example:

CCP has no respect for religion, they burn down the church, and destroy Uyghur cemetery. CCP even detain and torture Uyghur in concentration camp just because they are Muslims, Islam is depicted as a “disease” by CCP, they deprive freedom of religion and abuse human rights.

— Paul Lai 😷🇭🇰 (@plgod2013) May 2, 2020

There are also topics that seem to be nonsensical like topic 23. Though with some background knowledge we know that Bill gates has been talking about the risk of a pandemic for a long time but without that context it’s hard to know what this topic is about. Similarly, topic

France

Here are the visualisation for the topic model and corresponding document embeddings for France.

Topic Model

Document Embedding

Key Topics

Topic: 1. Documents required for leaving home during the lockdown (attestation deplacement).
Topic: 2. Health crisis (crise, santé).
Topic: 3. Money spent in financial easing (milliard, dette, euro).
Topic: 4. Agnès Buzyn former health minister.
Topic: 5. Stock market (bourse marché), economy, the fall (chute).
Topic: 6. Mostly concerning elderly care homes (EHPAD). (Mixed with the Donald Trump topic for some reason).
Topic: 9. Death toll.
Topic: 10. Jérôme Salomon, General Director of health and an important French infectious disease physician.
Topic: 11. Decrees from President Macron and Édouard Philippe the prime minister.
Topic: 12. Apple’s covid tracking application.
Topic: 14. Didier Raoult and his claimed about hydroxychloroquine as a treatment for covid.
Topic: 16. Death statistics. Similar to 9.
Topic: 15. Seems to be the L’Obs news magazine.
Topic: 17. China and hong kong relations.
Topic: 18. The United Kingdom. (Mixed with football topic.)
Topic: 19. South Korea.
Topic: 20. Christophe Castaner, the interior minster of France and lockdown measures.

People

Here is a summary of the key people mentioned in the topic model.

Person Role
Didier Raoult French physician and microbiologist famous for supporting the claim that Hydroxychloroquine and Azithromycin were effective treatments for COVID-19

Christophe Castaner
Minister of the Interior
Jérôme Salomon French infectious diseases physician and civil servant.

Agnès Buzyn
Former Minister of Solidarity and Health (May 2017 - Feb 2020)

Édouard Philippe
Prime Minister of France

Emmanuel Macron
President of France

Summary

France enacted a measure that required individuals to write an attestation detailing their purpose for leaving their home and when they would return. This measure gives it a distinct additional topic not found in the United Kingdom. The claims of Didier Raoult concerning hydroxychloroquine also make a particular talking point in France. There is also some emphasis on the situation in elderly care homes that is not as prevalently discussed in the United Kingdom. Otherwise the overall topics are similar to the United Kingdom.

There are are unclear topics like Topic 13\, that seems to reference multiple heterogeneous concepts. There are also local news topics like topic 8, that seems to reference the River Rhine Flooding. Topic 7 is also strange as it includes @aiphanmarcel, a not massively popular (19.2k followers) twitter user.

Selon @DIVIZIO1, la vente de #masques est autorisée depuis le 23 mars en pharmacie mais ils n’en ont pas vendu pour couvrir l’Etat qui n’arrivait pas à en fournir aux soignants et s’organisait derrière leur dos avec les supermarchés
De mieux en mieux...https://t.co/inmhISNjey

— Marcel Aiphan (@AiphanMarcel) May 3, 2020

Italy

Here are the visualisation for the topic model and corresponding document embeddings for Italy.

Topic Model

Document Embedding

Key Topics

Topic: 1. Death Toll.
Topic: 2. Corona Virus (Incorrect lemmatisation of corna to coronare)
Topic: 3. Sports.
Topic: 4. Republica TV. (is.gd for compressing links)
Topic: 5. Work from home. Topic: 6. President of the European Commission Ursula von der Leyen
Topic: 7. Possibly referencing L’Ospedale Maggiore di Lodi fundraiser.
Topic: 8. Border controls at the sea. Conflated with the Diamond Princess cruise Ship.
Topic: 9. Economic Crisis
Topic: 10. Stay at home & Health Emergency. Linked to Roberto Burioni.
Topic: 11. African immigration and politics.
Topic: 12. Health care Workers. (Medico = Doctor, Infermiere = Nurse)
Topic: 13. The death of Chilean writer Luis Sepúlveda.
Topic: 15. possibly US and New York Complex topic.
Topic: 16. Affected regions of Italy.
Topic: 17. Health care residences.
Topic: 19. Possibly referring to Angela Merkel and the decree (decreto) for a draft (bozza) agreement to issue ‘corona bonds’.
Topic: 20. Vaccination.

People

Here is a summary of the key people mentioned in the topic model.

Person Role
drawing
Ursula von der Leyen
President of the European Commission

Roberto Burioni
Italian physician and Professor of Microbiology and Virology

Luis Sepúlveda
Famous Chilean writer that died from COVID-19

Summary

Italy has many of the same fundamental topics as France and the United Kingdom. One interesting focus of Italy is on particular affected regions within the country while in the United Kingdom and France there isn’t mention of particular regions. As well as this we can see relations with the European Union are important with the mention of Angela Merkel and Ursula von de Leven. There are of course local issues like immigration from Africa and the death of Luis Sepúlveda.

We can see that very specific topics such as topic 7 that may be about fundraising for L’Ospedale Maggiore di Lodi. Here is an example tweet that topic 7 possibly references:

Aiutiamo L’Ospedale Maggiore di Lodi - COVID-19 <> Donate now! Let's help the nurses and the doctors, the real HEROES of the Ospedale Maggiore - LODi 🇮🇹🙏💪https://t.co/WIO8pbhSPN

— Giuseppe Locatelli (@locatellicharts) March 10, 2020

Conclusions

Topic modelling captured many features of the coronavirus outbreak though some topics were difficult to decipher and noisy. Though there were topics particular to each country there were underlying trends in both what people are interested in and how people share on twitter. Among this common topics are: