Data Management Strategies for Effective Storage and Access of Global Climate Data.
PRJ701, Graduate Diploma in IT, NMIT, 2013
Welcome to the home page for my PRJ702 project. Over time this page will be updated with background information and a high-level description of the project. More specific information and progress postings may be found under the “Project 702” category in the right-hand menu.
The Climate Chip project is a non-for profit research effort to analyse the effects of heat stress and associated health impacts driven by climate change. The research team includes 6 international experts including NMIT senior lecturer Matthias Otto and former lecturer Ryan Clarke. The research findings are widely published and also viewable via an interactive webmap at climatechip.org
For PRJ702, we will aim to investigate the data analysis and data storage solutions that support the webmapping and hothaps software. There are a number of critical challenges in this:
Database query Perfomance
The webmapping tool at climatechip.org is an interactive tool, and requires “near instant” response times. At this point in time, queries return in under 1 second, and this is the benchmark to maintain.
The database is populated by a combination of experimental observations and heat stress indicators (calculated from the observed results). The database contains approximately 28 million rows, which is not massive, but nor is it insignificant. Like most research, the needs are constantly changing as models are refined, new hypothesis tested etc. Running large updates over the dataset is currently unreliable and may take hours or days. One of the critical challenges for PRJ702 will be to provide a standard class of data analysis tools and to benchmark these to provide a reliable upperbound to their running time.
Integration of data analysis and data storage
Ideally, the process of data analysis and data storage will be as self-managed and automated as possible. PRJ702 will investigate methods to integrate the data analysis and data storage tools. Our initial hypothesis is that this may be achieved by interfacing with the database with a popular programming language and an appropriate API (e.g. connect to MySQL via Python and the MySQLdb API). If there is time, we will aim to compare the effectiveness of this approach using relational and NoSQL data storage techniques.