As promised a few months ago, I am writing this to provide more details over the technologies discussed on the Microsoft Summit 2014 agenda. One of those technologies is NOSQL. David Chappell in his "NoSQL on Azure: An Introduction" presentation has brought a few interesting ideas over why you should use NOSQL, when you should use it and what happens with his older brother SQL.
As David brought up, there is no such thing as the post-SQL era, but rather it's the SQL+ era, as NOSQL has come to resolve some of the short comings of SQL, but in no way has the ability to replace it.
So SQL vs NOSQL:
As in any ring fight, let's present the players.
In the SQL team we have MySQL (currently the preferred storage server in web development), which can be found in multiple forms like MySQL (the current version supported by Oracle), MariaDB, Perconna and others. Mysql, especially in the web app, websites and mobile apps market is a very good player providing developers with very proficient relational storage engine, or should I say engines, through MyISAM, InnoDB and even less used ones like Memory. Archive or the destroyer of data the BLACKHOLE engine.
Another big player here is Microsft SQL Server, which provides a very good SQL setup, running on Windows or Windows Server editions. It has high availability and good storage capabilities. Data strings are a bit more complex than Mysql and is the highly used SQL server when we have websites and web apps built in ASP or ASPX.
PostgreSQL is versatile relational database player on the SQL part which is highly compliant and stable, but has the disadvantage of being slower than Mysql, thus taking the market when high amounts of data are involved, so usually it is used in web applications or other server side storage needs.
SQLite is a light player, but nevertheless a player, SQLite is used in portable applications with no user management as none is available and when no higher-levels of multi-client concurrency are required.
On the NOSQL side we have:
Apache CouchDB is a document storage server. Accessible through either it's interface called Futon or via an HTTP/HTTPS API. It's scalable and provides the documents in an easy to manage fashion. It's used mostly for small documents without a fixed schema that have a key:value representation.
Apache Cassandra is used to manage large active datasets by over 1500 companies like eBay, Github, GoDaddy, Netflix and others. Cassandra's data model offers the convenience of column indexes with the performance of log-structured update and powerful built-in caching. According to benchmarking results Cassandra is the biggest NOSQL player on the market, closely followed by HBase and MongoDB.
MongoDB is another player that handles expressive query language, secondary indexes, strong consistency with a high degree of stability and fast data access. MongoDB's query language provides varied field-level operators, data types and in-place updates. Also MongoDB's secondary indexes support includes geospatial indexes, TTL indexes and full text search indexing.
Redis is an open source, BSD licensed, advanced key-value cache and store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets, sorted sets, bitmaps and hyperloglogs.
So now that we know the players and what they do, we start to have an idea on what they are good for:
SQL is good for:
- Relational databases, foreign key indexes and relational data management (products linked to orders, liked to customers, linked addresses and more)
- Fixed schema (made scalable by EAV implementation), with restriction enforcements.
- Relational analytics with cause and effect directly determined by mathematical formulas.
NOSQL is good for
- Quick data access for high amounts of data with no schema
- BigData Analytics with probability analysis
- High scalability and dynamic secondary indexes
A number of web apps and e-commerce websites (including Magento with some configuration) are now built to maximize the advantages of both systems by using a relational database like a MySQL database to set the data modeling schema and a MongoDB or Redis instance for quick in memory value-key pairing for quick access to the information in production environments.
So when and why?
- Use SQL when you need the different information types to be related via very well defined indexes through 1 to 1, 1 to many, many to 1 or many to many relations. This cases are best for e-commerce portals, fixed schema mobile apps, websites which have mostly text base data that needs to be stored in a database.
- Use NOSQL when you need quick data access to a non-fixed schema document in web apps with dynamic models, mobile apps with dynamic models, document storage with different schema that hold not only text base data, but images, geo spatial information and more.
Please feel free to comment and tell me your favorites and why. SQL is dead, long live the SQL+.