It has been a great morning – and I have a project idea! I will blog more on this later this week, in the meantime this posting is going to have links to interesting resources I am going to need to check out:
HASH TABLESAnti-relational databases
geodatabase architecture: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//003n000000r4000000
the Big Picture: Big Data for a new wave of analytics http://www.youtube.com/watch?v=1qIZ4gQHssU
this was an interesting video. For me, the takeaway messages were:
- there are alternatives to relational dbs out there now.
- development of some of these alternatives is driven by a need for more flexible, fluid data models -> relational models are still appropriate **and exceptional!**, but not or all data needs
- the Mark Logic product relies heavily on in-memory indexing. This indexing provides a view on metadata across the whole data
I also have some questions:
- Hadoop – I would like to find out just what this is.
- Map reduce – an analytical tool / algorithm. What does it do???
The database revolution (exploring the trends in databasing) http://www.databaserevolution.com/
NoSQL data modeling techniques http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
there is some interesting stuff here:
a visual description of the evolution of NoSQL data structures (Key-values, big-table, full text, graphs)
=> need to investigate bib tables and graphs. Matthias currently has his data stored as a b-tree (essentially a graph), and yet it is one giant denormalised table in MySQL (so does this relate to big tables?? I don’t know but will find out)
looking further, it looks like a big table has: KEY -> time stamp -> value structure. Perhaps very interesting for Matthias’ data
“it turned out that software applications are not so often interested in in-database aggregation and able to control, at least in many cases, integrity and validity themselves” – is this because software is awesome, or because software programmers like what they like, and are just not using dbs???? I think this might be an important question to form an opinion on!!!
- Relational modeling is typically driven by the structure of available data. The main design theme is ”What answers do I have?”
- NoSQL data modeling is typically driven by application-specific access patterns, i.e. the types of queries to be supported. The main design theme is ”What questions do I have?” – I’m not sure I agree with this. I think of relational dbs as an answer to management’s constant demands for complex reports, i.e. it is about answering the complex questions, c.f. only allowing questions that fit the pre-defined answers.
“NoSQL data modeling often requires a deeper understanding of data structures and algorithms than relational database modeling does” – I do agree with this. Something like mongoDB has a whole lot of built-in functionality, but if you want something a little out of the box, then you have to code it up yourself.
NoSQL == denormalised data http://dclure.org/logs/nosql-equals-denormalized-data/
SQL and denormalisation at Stack Exchange http://meta.stackoverflow.com/questions/120016/how-is-nosql-and-data-denormalization-used-on-stack-overflow-stack-exchange
MongoDB – incredibly fast writes and generally faster reads: http://blog.fogcreek.com/the-trello-tech-stack/
Optimising the db by denormalising: http://technet.microsoft.com/en-us/library/cc505841.aspx
responsible denormalisation: http://msdn.microsoft.com/en-us/library/aa224786(v=sql.80).aspx
DB normalisation tips: http://msdn.microsoft.com/en-us/library/office/aa139981(v=office.10).aspx
data normalisation: http://msdn.microsoft.com/en-us/library/aa291817(v=vs.71).aspx
Relational or not-relational, that is the question…:
(i am putting this one at the top, because it discusses many of the issues I think I am going to have to grapple with. Specifically, grinding through the data, row by row… SLOW!) http://quaero.csgi.com/blog/464-when_to_use_hadoop_instead_of_a_relational_database_management_system_rdbms
(links through to dbms2 article)
(this is heavily biased, but interesting ideas on impedance mismatch) http://www.intersystems.com/cache/whitepapers/hybrid.html
On data / big data / general descriptions, nothing to specific discussions:
let the db handle it, or code it?