Project has begun! Primarily, this will be a design research project focusing on designing an effective data storage strategy for the Climate Chip project. Many many thanks to Todd and Clare for their input in the direction of this project 🙂
The Climate Chip project is a non-for-profit project driven to provide information and resources about heat stress and the health impacts of global climate change. An interactive webmap is hosted at http://climatechip.org to display this information. I am lucky to be able to join Matthias and investigate the database that sits behind the webmap. At the moment the database is managed with MySQL and it works seemlessly with the webmap. However, there some unresolved issues with it.
Inside the MySQL table, the data is organised in quite a funky and unique way (I will describe this fully in a later post). My initial impression is that this is the cause of some of the observed difficulties. But at the same time, it is this unique structure that makes it blazingly fast behind the webmap. It’s this trade-off between maintainability and performance that will be the main thrust of this research.
STAGE 1: Identification of the problem
This phase will be focused on familiarising myself with the Climate Chip project and the existing database. The first phase will be to collect user requirements, before installing, characterising and testing the existing database. We will aim to re-create the maintenance problems that have been previously observed, and benchmark the performance of large insert operations.
- Climate Chip project requirements
- Climate Chip database
- Live database (existing data model)
- Data model (existing data model)
- A suite of tests for benchmarking query performance
- Quantifiable performance measures
- Hypothesis based on the observations of benchmark tests
STAGE 2: Solution Design
This is where the new research begins. We will begin by characterising spatial data; what are the unique attributes of spatial data? what characteristics of this data can we use, and which do we have to be particularly careful to maintain the integrity of?. Next we will investigate strategies within the GIS industry for managing and storing geospatial data. We will investigate commonly used database products and make an hypothesis as to which product will be best for the management of the Climate Chip data.
- Describe the characteristics of geospatial data
- Investigate storage strategies within the GIS industry
- Research methods for normalising, indexing and partitioning datasets within a geodatabase
- Logical Data Model
- Physical Data Model
Stages 1 & 2 are the most critical. And if I can achieve what is set out above, then I think the project will meet all of my personal goals. However, it would be really great to have time to prototype and evaluate the proposed design solution.
STAGE 3: Design Prototype
- Build a functional prototype using the database product identified in Stage 2.
- Implement normalisation, indexing and partitioning as described in the physical model from Stage 2
- Benchmark the prototype using the test suite developed in Stage 1
- Evaluate the design and suggest improvements (evaluation done against performance and meeting requirements)
- evaluation and discussion of design
- recommendations for future development
I am working on my project proposal at the moment , and once that is approved I will be able to roll up the sleeves and get started. There is a lot of work to do here, in quite a small amount of time. There are going to be weeks were it will be difficult to dedicate the necessary time, I will have to work solidly and consistently from the beginning to minimise pressure in the last few weeks. Looking forward to the challenge!