Twitter simultaneously allows people to be heard and to hear, easily and in real time, bringing with it potentially fascinating and ground breaking insights for anybody trying to take the public’s pulse on a hot issue. It is almost mandatory that so much benefit comes with its challenges, namely how to process the humongous amounts of data available every minute. Never one to shy away from some super number crunching, this post is the first one of a series on Twitter.
The chart above represents more than 200,000 mentions of the twitter account belonging to Mexico’s President Enrique Peña Nieto during the past 20 days. It is intended as a proxy of how popular the President of Mexico is on Twitter. Every tweet was analyzed and scored as either positive or negative. Then the total number of positive/negative tweets per hour was recorded and divided by the total number of tweets in that hour. The results show, that on average, Peña Nieto has had a positive sentiment score (0.54) but there are severe hourly negative spikes.
If you would like to know more about how the calculations were elaborated, continue reading
How safe is Mexico? That is a frequent question people ask me. There is even a website about it (see this blog post). So I’ve decided it was time to go one step further and make an interactive map (click here for full screen) of poverty and crime in Mexico.
The objective was to see which municipalities have higher crime rates and visually check if municipalities with higher crime rates also have high poverty rates. While unfortunately, the most recent poverty data at the municipality level is for 2010 while the crime data, at the same level, is only available for 2011, 2012, 2013, I was still able to glean some interesting insights.
The interactive map´s main take away is that high rates of selected crimes are concentrated in just a few municipalities. Moreover, there appears to be no direct link to high poverty rates.
To do the analysis I used R, QGIS and TileMill, all that code is freely available in my github account. If you are interested in learning more about how I made the map, keep reading!
The map above shows forest change from 2000 to 2013 in the Mayan Riviera area of Mexico. The map tool was developed by the Global Forest Watch. The tool allows the calculation of total forest loss and gain during the aforementioned period.
This map shows the percentage of the population living in poverty per municipality during 2010 (click here for full screen). Clusters of municipalities where more than 80% of the population is classified as poor can be found in the Northwestern mountains, Southern mountains and near the border with Guatemala.
High poverty rates can be found in less accessible areas, usually at high altitudes where scarce water supply makes large scale agriculture difficult. Moreover, poverty has a higher incidence in municipalities with larger indigenous populations.
While the poverty rate of large urban areas is around 20 to 39% percent, it is still significantly less than that of rural areas.
Last summer, CONEVAL published new data at the State level. The new statistics show that Mexico’s poverty rate fell slightly between 2010 and 2012, dropping 0.6 percent, from 46.1 percent to 45.5 percent (check out this article by the Wilson Center).
It would be interesting to see what the new statistics look like at the municipality level.
This is a short but sweet post on how to create a geojson file in R . Remember that you can render geojson files in github.
What if policy makers worked like entrepreneurs and engineers? Sometimes, public policy initiatives are very difficult to replicate due to the lack of management best practices, freely available data and code. For one moment imagine working on a project or designing a public policy that would:
- Management best practices:
- Let people know on what are you working at every time
- Less generals and more soldiers: have a lean management structure with the least herarchical levels possible
- Avoid information silos
- Manage requirements and not activities
- Other best practices:
- Have a version control system: ability to track who added what and seamlessly roll back to previous versions
- Automated testing of assumptions
- Simple deployment: run this script and see the results
- Code reviews: public repos on github with everything needed to replicate the project and add more contributors to it
This map shows the growth of Kaggle community in North America. That is the submissions by Data Scientists to Kaggle competitions. Among other striking facts is the small amount of submissions done in Latin America, specially in Mexico, compared to the population of other countries. Where is the Data Science in Mexico? What can we do to foster it?
Checkout the interactive racial dot map here and their impressive methodology here. They also have a repo with the code.
This post shows how to use Google Maps‘ API with R making some tweaks to this function. Combine the first part with sapply or Plyr and it becomes a very powerful tool in just a few lines of code. You can find a gist in RMarkdown with the code here or click below to continue reading.
So true about data science. No explanations required. Source: Drew Conway