Energy Mobility Tweets Searches

Exploring Emojis

Michael Anslow, Martina Galletti
SONY Computer Science Lab - Paris

Purpose of Analysis

On Twitter, people can share their opinions on a variety of topics, they can discuss recent issues and express their sentiment towards a particular event or a certain object. Nevertheless, there is a fundamental problem when trying to automatically analyze the variety of opinions expressed in Tweets: they are not only informal, but they are also very short. Even if they have a fixed length of 280 characters, most of them do not hit Twitterโ€™s character limit. This is why every character present in Tweets is important when analyzing the content of Tweets, even Hashtags or Emojis. More specifically, when there is a limited character allowance, Emojis can help in communicating a certain reaction towards a particular topic or to communicate a general feeling about specific news.

However, research on Emoji is rare, given the lack of large dataset and the subjectivity of their interpretations. Still, some papers, such as the one of Mengdi Li & al proved that emoji can indeed be useful if used in Semantic classification tasks where they are reported to significantly improve the results. This is why we decided that in our preliminary analysis we needed to explore the use of Emoji in different countries. We used an open-source tool called Scattertext, for visualizing significant differences in the use of the Emoji for different countries. The results were quite interesting and they will be taken into account in further research to refine these initial analyses to answer more specific questions.

Interactive charts of distribution of Emojis for every couple of countries are shown below. As you can see, the most characterizing Emojis for one country are displayed on the right, together with the most frequent Emoji over the two different categories. The Emoji are plotted on a two-dimensional graph where on the x-axis and y-axis we can see the degree of diffusion of the Emoji for each of the two countries. Each axis is describing the Emoji for one country. For example, in the first graph, on the x-axis, we can see the frequency of Emoji corresponding to the United Kingdom, while on the y-axis the one for Italy. Discriminating Emojis for one country will be displayed respectively on the upper left and the lower right corner. One last thing worth mentioning is that the plots are interactive. If we click on one particular emoji, some metadata information is displayed, such as the original text of the Tweet where the Emoji was originally used, the location indicated by the user or date and time of the Tweet.

Caveats and Data Processing

We are aware that multiple emoji can appear together and we do know that the intentions of the users can be different depending on the combination of Emojis used. Nevertheless, we chose to plot single Emoji because Scattertext doesnโ€™t allow the plotting of combinations of different ones. We wrote to the author of Scattertext, Jason Kessler, to find a solution. He replied that he will change the definition of Emojis in the future so to allow multiple emojis to be considered as one.

United Kingdom and Italy

Emoji summary

France and United Kingdom

Emoji summary

France and Italy

For each of the two countries, the flags are one of the most distinctive Emoji of the

Emoji summary

Conclusions

There are some important commonalities and differences across the United Kingdom, Italy, and France.