The whole world is trying to come to terms with what has happened during the COVID-19 outbreak and what the repercussions might be into the distant future. To this end, the Language Team at Sony CSL Paris has joined the efforts of the scientific community to better understand the global reaction to COVID-19 using our varied and extensive skills in computational linguistics and natural language processing.
Almost as soon as the corona virus became a global talking-point, various data sets were created to try and document and explore the phenomena. In particular the COVID-19: The First Public Coronavirus Twitter Dataset began collecting data from the last week of January 2020 on tweets matching various COVID-19 related hash tags and particular Twitter accounts. Twitter serves as an informal basis for diffusing news and opinions. As such it captures the share-worthy focus of its users across the whole world. There are, of course, biases in who chooses to use Twitter, who has access to the infrastructure to use Twitter and what Twitter users choose to share, however it has the benefit of having a very large global community with a very low effort-to-output ratio in terms of publishing content online. This dataset serves as the basis for our exploratory data analysis and will be explored in depth in subsequent reports.
Our fundamental aims are to explore:
Furthermore, given that we are an international team, we are interested in exploring this across different languages and regions. In particular: France, Italy and the United Kingdom.
We should stress that, what we present here is a preliminary exploration of the data and initial findings. It is largely to gain and share insights into COVID-19 towards further research outcomes. We are more than happy to collaborate going forward. Feel free to contact us.
Our exploration of the dataset is divided into, largely self-contained, reports that are broadly ordered as follows:
With this in mind, feel free to jump in to any report.