An article by Florencio Paucar Sedano.
I am glad to present on bigdatatales.com my thesis work about social media analysis of the Presidential election in Peru using Twitter data. I present 7 polls for different date ranges before the second round of presidential election to be realized on 5 June 2016 between 2 presidential candidates: Ms. Keiko Fujimori from Fuerza Popular, and Mr. Pedro Pablo Kuczynski from Peruanos por el Kambio. I use textual content of tweets collected through Twitter API since 11 April 2016 and the Supervised Aggregated Sentiment Analysis (SASA) method, which predicts the results considering treatment of humor, double meanings and sarcasms present in the tweets. Hence, SASA discerns noise information because it focuses on the estimation of aggregated distribution of opinions rather than on the individual classification of each single text. The algorithm used by the SASA analysis is called README and was initially proposed by Daniel Hopkins and Gary King. Moreover, the estimation results using SASA predicts the electoral campaigns with an average mean absolute error of 2.5 points which is an acceptable range in this type of estimations. This work considers the past experience of electoral process analysis realized in different countries, such as United States, France, and Italy by Andrea Ceron, Luigi Curini and Stefano Maria Iacus.
The polls show the preferences for the candidates in every week since 11 April 2016, when the second round of presidential election started. In the graph below we observe the percentage of preferences taking in consideration only valid preferences in tweets, i.e. explicit statements to vote for a certain candidate or a negative statement opposing a candidate connected to the electoral campaign of the rival candidate.
We observe that the preferences vary with time. Ms. Keiko Fujimori from Fuerza Popular starts having high preferences from 11 April to 15 April, then Mr. Pedro Pablo Kuczynski from Peruanos por el Kambio have a better performance from 16 April to 22 April. From 23 April to 13 May Ms. Keiko Fujimori dominated the electoral campaign but unexpectedly Mr. Pedro Pablo Kuczynski recovered the majority acceptance of the population from 14 May to 20 May. Finally, in the second last week (21 May to 27 May) the electoral campaign shows very similar preferences: 50.04% for Mr. Kuczynski and 49.9% for Ms. Fujimori. Therefore, the Peruvian electoral campaign is very uncertain, at least according to the what people express on Twitter. Unfortunately, I cannot present now the results by our methodology for the last week of the electoral campaign, due to Peruvian laws (*). Tomorrow at 16:00 pm (Current Local Peruvian time) I am presenting the electoral results according to Twitter data analysis. Is Twitter really predictive of electoral results? Stay tuned!
(*) According Peruvian laws it is forbidden to publish electoral polls one week before the Election Day
 D. Hopkins, G. King, A Method of Automated Nonparametric Content Analysis for Social Science (2010).
 A. Ceron, L. Curini, S. Iacus Using social media to forecast electoral results: A Review of the state of the art (2015).
Florencio Paucar Sedano is a Peruvian master student in Business Informatics at Università of Pisa, Italy. He got the Bachelor of Major in Computer Engineering at the Pontifical Catholic University of Perù. Follow him on Twitter: @florenciopaucar