Project 702

Resources, resources, resources…

It has been a great morning – and I have a project idea! I will blog more on this later this week, in the meantime this posting is going to have links to interesting resources I am going to need to check out:

HASH TABLESAnti-relational databases

geodatabase architecture: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//003n000000r4000000

http://www.ibm.com/developerworks/data/library/techarticle/dm-1209hadoopbigdata/

the Big Picture: Big Data for a new wave of analytics http://www.youtube.com/watch?v=1qIZ4gQHssU

this was an interesting video. For me, the takeaway messages were:

  • there are alternatives to relational dbs out there now.
  • development of some of these alternatives is driven by a need for more flexible, fluid data models -> relational models are still appropriate **and exceptional!**, but not or all data needs
  • the Mark Logic product relies heavily on in-memory indexing. This indexing provides a view on metadata across the whole data

I also have some questions:

  • Hadoop – I would like to find out just what this is.
  • Map reduce – an analytical tool / algorithm. What does it do???

The database revolution (exploring the trends in databasing) http://www.databaserevolution.com/

NoSQL data modeling techniques http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/

there is some interesting stuff here:

a visual description of the evolution of NoSQL data structures (Key-values, big-table, full text, graphs)

=> need to investigate bib tables and graphs. Matthias currently has his data stored as a b-tree (essentially a graph), and yet it is one giant denormalised table in MySQL (so does this relate to big tables?? I don’t know but will find out)

looking further, it looks like a big table has: KEY -> time stamp -> value structure. Perhaps very interesting for Matthias’ data

“it turned out that software applications are not so often interested in in-database aggregation and able to control, at least in many cases, integrity and validity themselves” – is this because software is awesome, or because software programmers like what they like, and are just not using dbs???? I think this might be an important question to form an opinion on!!!

  • Relational modeling is typically driven by the structure of available data. The main design theme is  ”What answers do I have?” 
  • NoSQL data modeling is typically driven by application-specific access patterns, i.e. the types of queries to be supported. The main design theme is ”What questions do I have?”  – I’m not sure I agree with this. I think of relational dbs as an answer to management’s constant demands for complex reports, i.e. it is about answering the complex questions, c.f. only allowing questions that fit the pre-defined answers. 

“NoSQL data modeling often requires a deeper understanding of data structures and algorithms than relational database modeling does” – I do agree with this. Something like mongoDB has a whole lot of built-in functionality, but if you want something a little out of the box, then you have to code it up yourself.

NoSQL == denormalised data http://dclure.org/logs/nosql-equals-denormalized-data/

SQL and denormalisation at Stack Exchange http://meta.stackoverflow.com/questions/120016/how-is-nosql-and-data-denormalization-used-on-stack-overflow-stack-exchange

MongoDB – incredibly fast writes and generally faster reads: http://blog.fogcreek.com/the-trello-tech-stack/

Optimising the db by denormalisinghttp://technet.microsoft.com/en-us/library/cc505841.aspx

responsible denormalisation: http://msdn.microsoft.com/en-us/library/aa224786(v=sql.80).aspx

On Normalisation:

http://msdn.microsoft.com/en-us/library/ms191178(v=sql.105).aspx

DB normalisation tips: http://msdn.microsoft.com/en-us/library/office/aa139981(v=office.10).aspx

data normalisation: http://msdn.microsoft.com/en-us/library/aa291817(v=vs.71).aspx

Relational or not-relational, that is the question…:

(i am putting this one at the top, because it discusses many of the issues I think I am going to have to grapple with. Specifically, grinding through the data, row by row… SLOW!) http://quaero.csgi.com/blog/464-when_to_use_hadoop_instead_of_a_relational_database_management_system_rdbms

http://readwrite.com/2011/06/09/when-you-should-still-use-a-re#awesm=~oawAMUXkHJoNZN

(links through to dbms2 article)

http://www.dbms2.com/2011/05/29/when-to-use-relational-database-management-system/

http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores

(this is heavily biased, but interesting ideas on impedance mismatch) http://www.intersystems.com/cache/whitepapers/hybrid.html

http://econsultancy.com/nz/blog/10654-five-legitimate-use-cases-for-nosql-databases

thoughtsondata.com

On data / big data / general descriptions, nothing to specific discussions:

http://readwrite.com/2011/05/10/from-big-data-to-nosql-the-rea#awesm=~oawBt44QXBWR01

let the db handle it, or code it?

http://programmers.stackexchange.com/questions/171024/never-do-in-code-what-you-can-get-the-sql-server-to-do-well-for-you-is-this

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s