We have officially reached the half-way point in the project calendar. It seems like a good time to take stock of progress and make sure the final 6 – 8 weeks are on track. The initial milestones, as described in the post: redefining project direction are below:
|# 1||Requirements established, “problem” accurately identified, 2012 data loaded into database||August 23rd, 2013|
|# 2||Programming language chosen for Data Analysis and relevant APIs chosen for database interaction, Database connection tested and test data loaded||September 6th, 2013|
|# 3||Data Analysis Class and relevant Methods Designed||September 26th, 2013|
|# 4||Data Analysis Class tested and experimental data loaded into database||October 11th, 2013|
|# 5||Data Analysis Class Benchmarked and reliable upperbound established for each module||November 1st, 2013|
Milestone # 1: COMPLETE
This wasn’t as straight forward as I had originally imagined. The biggest challenge here was defining the project direction. Once I began to explore the climate chip project, I quickly realised that my initial plans weren’t quite the right fit. The first 2 – 3 weeks of project were spent trying to get a better idea of the project needs and planning a more suitable plan of attack.
Milestone # 2: COMPLETE
With a better idea of the requirements, this Milestone was a lot more straight forward than the first. Python was chosen for its expressive power and rapid time-to-development. During this phase I dived into the mathematical models for WBGT and UTCI in an attempt to identify the class structure I needed to implement.
MILESTONE # 3: 80 % COMPLETE
An awful amount of work has gone into implementing the necessary calculations and class structures. Progress has been good and I learned that trying to fully optimise everything up front was a time-expensive process. Since learning this, I have taken the approach of getting processes to function, and then optimise them when I have extra time to devote to it.
BULK LOADING of new annual climate data is complete.
Still to complete in this phase:
- Improve WBGT performance
- Improve database write
MILESTONE # 4
This is a natural extension of MILESTONE # 3. I will consider this ‘complete’ when all of the processes from MILESTONE # 3 are automated and wrapped together in a simple CLI tool
MILESTONE # 5
Benchmarking the Climate Analysis Package would be nice, it would satisfy my need to quantify everything. However, I need to keep in mind that this is a design research process. So I am changing this milestone.
NEW MILESTONE # 5: 21 October, 2013
One of the key outcomes for this project is an easy to use and maintain system for the analysis of scientific data. ‘Easy to use and maintain’ will be measured by:
- Time to adopt. Minimise the learning curve to initially adopt the system. A lot of this will depend on the technology chosen, and we will try to find a combination of technology that fits well with the climate chip team’s existing skill sets.
- Maintenance cost vs. performance cost. Because data is only loaded annually at the moment, the time to actually populate the database is less important than the time spent to maintain a system. Therefore, a highly tuned and optimised system that took days or weeks to maintain each year is less suitable than a slower system that required little maintenance
With this in mind I am planning to investigate alternatives to relational databases in the hopes that I can find a NoSQL alternative that maps better to Matthias existing programming skills. I will aim to investigate the query performance of a NoSQL system, compared to a fully relational one. If query performance it is reasonable, then explore comparing the question of maintainability.