Project 702

Resources, resources, resources…

It has been a great morning – and I have a project idea! I will blog more on this later this week, in the meantime this posting is going to have links to interesting resources I am going to need to check out:

HASH TABLESAnti-relational databases

geodatabase architecture:

the Big Picture: Big Data for a new wave of analytics

this was an interesting video. For me, the takeaway messages were:

  • there are alternatives to relational dbs out there now.
  • development of some of these alternatives is driven by a need for more flexible, fluid data models -> relational models are still appropriate **and exceptional!**, but not or all data needs
  • the Mark Logic product relies heavily on in-memory indexing. This indexing provides a view on metadata across the whole data

I also have some questions:

  • Hadoop – I would like to find out just what this is.
  • Map reduce – an analytical tool / algorithm. What does it do???

The database revolution (exploring the trends in databasing)

NoSQL data modeling techniques

there is some interesting stuff here:

a visual description of the evolution of NoSQL data structures (Key-values, big-table, full text, graphs)

=> need to investigate bib tables and graphs. Matthias currently has his data stored as a b-tree (essentially a graph), and yet it is one giant denormalised table in MySQL (so does this relate to big tables?? I don’t know but will find out)

looking further, it looks like a big table has: KEY -> time stamp -> value structure. Perhaps very interesting for Matthias’ data

“it turned out that software applications are not so often interested in in-database aggregation and able to control, at least in many cases, integrity and validity themselves” – is this because software is awesome, or because software programmers like what they like, and are just not using dbs???? I think this might be an important question to form an opinion on!!!

  • Relational modeling is typically driven by the structure of available data. The main design theme is  ”What answers do I have?” 
  • NoSQL data modeling is typically driven by application-specific access patterns, i.e. the types of queries to be supported. The main design theme is ”What questions do I have?”  – I’m not sure I agree with this. I think of relational dbs as an answer to management’s constant demands for complex reports, i.e. it is about answering the complex questions, c.f. only allowing questions that fit the pre-defined answers. 

“NoSQL data modeling often requires a deeper understanding of data structures and algorithms than relational database modeling does” – I do agree with this. Something like mongoDB has a whole lot of built-in functionality, but if you want something a little out of the box, then you have to code it up yourself.

NoSQL == denormalised data

SQL and denormalisation at Stack Exchange

MongoDB – incredibly fast writes and generally faster reads:

Optimising the db by denormalising

responsible denormalisation:

On Normalisation:

DB normalisation tips:

data normalisation:

Relational or not-relational, that is the question…:

(i am putting this one at the top, because it discusses many of the issues I think I am going to have to grapple with. Specifically, grinding through the data, row by row… SLOW!)

(links through to dbms2 article)

(this is heavily biased, but interesting ideas on impedance mismatch)

On data / big data / general descriptions, nothing to specific discussions:

let the db handle it, or code it?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s