Database Topic Overview
Jump to navigation
Jump to search
<slideshow style="nobleprog" headingmark="⌘" incmark="…" scaled="true" font="Trebuchet MS" >
- title
- Database topic overview
- author
- Alexander Patrakov
</slideshow>
SQL and NoSQL ⌘
- SQL = Structured Query Language
- Standard used by many databases
- We have a course: https://www.nobleprog.com/sql-fundamentals-training
- Can be delivered on any traditional database
- You practiced it for MySQL = you know 50% of PostgreSQL
- NoSQL = Not Only SQL
- Collective term for non-traditional databases
- Every NoSQL database is different
Let's try some SQL ⌘
- SQL databases contain tables
- How to create them and alter their structure?
- DDL = Data Definition Language
- How to Create, Read, Update, Delete rows?
- DML = Data Modification Language
- How to create them and alter their structure?
- Learn statements: CREATE TABLE, INSERT, SELECT, UPDATE, DELETE
- Try them on MySQL and PostgreSQL
- Or maybe online in sql.js?
For those who want to self-study SQL ⌘
- Use this online tool or another one
CREATE TABLE dept (id INTEGER PRIMARY KEY, dname VARCHAR(255) NOT NULL, loc VARCHAR(255) NOT NULL);
INSERT INTO dept VALUES(10,'ACCOUNTING','NEW YORK');
INSERT INTO dept VALUES(20,'RESEARCH','LONDON');
INSERT INTO dept VALUES(30,'SALES','PARIS');
INSERT INTO dept VALUES(40,'OPERATIONS','BERLIN');
CREATE TABLE salgrade(id INTEGER PRIMARY KEY, losal DECIMAL(10,2) NOT NULL, hisal DECIMAL(10,2) NOT NULL);
INSERT INTO salgrade VALUES(1,700,1200);
INSERT INTO salgrade VALUES(2,1201,1400);
INSERT INTO salgrade VALUES(3,1401,2000);
INSERT INTO salgrade VALUES(4,2001,3000);
INSERT INTO salgrade VALUES(5,3001,9999);
CREATE TABLE emp (id INTEGER PRIMARY KEY, ename VARCHAR(255) NOT NULL,
job VARCHAR(255) NOT NULL, mgr INTEGER REFERENCES emp(id),
hiredate DATE NOT NULL, sal DECIMAL(10,2) NOT NULL,
comm DECIMAL(10,2), deptno INTEGER NOT NULL REFERENCES dept(id));
INSERT INTO emp VALUES(7839,'BUSH','PRESIDENT',NULL,'1981-11-17',5000,NULL,10);
INSERT INTO emp VALUES(7698,'BLAIR','MANAGER',7839,'1981-05-01',2850,NULL,30);
INSERT INTO emp VALUES(7566,'PUTIN','MANAGER',7839,'1981-04-02',2975,NULL,20);
INSERT INTO emp VALUES(7902,'TOOSK','ANALYST',7566,'1981-12-03',3000,NULL,20);
INSERT INTO emp VALUES(7369,'THATCHER','CLERK',7902,'1980-12-17',800,NULL,20);
INSERT INTO emp VALUES(7499,'BAROSSO','SALESMAN',7698,'1981-02-20',1600,300,30);
INSERT INTO emp VALUES(7521,'WALTON','SALESMAN',7698,'1981-02-22',1250,500,30);
INSERT INTO emp VALUES(7654,'CHIRACK','SALESMAN',7698,'1981-09-28',1250,1400,30);
INSERT INTO emp VALUES(7782,'MERKEL','MANAGER',7839,'1981-06-09',2450,NULL,10);
INSERT INTO emp VALUES(7788,'CARNEGIE','ANALYST',7566,'1982-12-09',3000,NULL,20);
INSERT INTO emp VALUES(7844,'GATES','SALESMAN',7698,'1981-09-08',1500,0,30);
INSERT INTO emp VALUES(7876,'FORD','CLERK',7788,'1983-01-12',1100,NULL,20);
INSERT INTO emp VALUES(7900,'BUFFETT','CLERK',7698,'1981-12-03',950,NULL,30);
INSERT INTO emp VALUES(7934,'ELISON','CLERK',7782,'1982-01-23',1300,NULL,10);
- Click "Expand" to see the queries for populating the initial structure and content
- Follow SQL Programming slides and exercises
More SQL and DBA topics ⌘
- Relational algebra
- How to express 1:0-1, 1:N, or N:M relations using FOREIGN KEYs
- How to combine matching rows using a JOIN
- Normal forms and integrity
- Taught only as a part of Oracle courses, but really belong to all SQL courses
- Indexing
- Grouping and aggregation
- Concurrency and transactions
- Security
Two major applications ⌘
- Web applications
- Queries usually touch only a tiny fraction of data
- Performance is needed
- Achieved by a good schema, properly placed indexes, and well-written queries
- SQL is usually simple
- Data warehousing, ad-hoc analytics, reporting
- Real-time performance is not needed
- Queries sometimes go through the entire database for aggregation
- More advanced SQL features are used
Available SQL-based databases ⌘
- MySQL and its variants (MariaDB, MySQL Cluster, Percona Server)
- PostgreSQL
- SQLite
- No separate training, but mentioned as a subtopic for Android and iOS courses
- MS SQL Server
- MS Access
- Oracle
- Firebird
- Specialized, less common: Impala, SequoiaDB, Trafodion
Developer topics ⌘
- How to issue SQL statements from various programming languages
- Non-standard SQL extensions
- For some databases - how to write such extensions
- Query optimization
- Internals
ORMs ⌘
- There is a conceptual mismatch between software written in object-oriented style and relational databases
- There is a need to convert between object fields/properties and database columns
- Trivial for strings and integers, non-trivial for arrays and references to other objects
- One should be able to create a graph of object references and persist it to the database
- Solved (to some degree) by Object-Relational Managers (ORMs)
- There is a need to convert between object fields/properties and database columns
- Available ORMs:
- Java: Hibernate
- PHP: Doctrine (taught as a part of Symfony web framework, deserves more)
- Python:
- Web framework neutral: SQLAlchemy, Peewee (both have no trainings, can be added to a Flask training)
- Web framework specific: Django ORM, Web2py DAL (both taught as a part of the corresponding web frameworks)
- C#: ADO.NET (in particular, LINQ to SQL)
- Reality: some web developers don't know SQL, because ORM provides good-enough facilities for accessing the database
Sysadmin topics ⌘
- Implicit assumptions:
- Sysadmins already know the basics of SQL
- May or may not be true in the real world
- Sysadmins only need the basics of SQL + sysadmin-only SQL statements
- Sysadmins already know the basics of SQL
- Installation & configuration
- Data safety against power loss
- Typical errors and how to fix them
- Access control
- Replication
- Performance monitoring and tuning
Typical problems during courses ⌘
- Licensing
- Some companies prohibit the use of evaluation copies for training purposes - please check every time!
- Evaluation versions may be too restricted in functionality
- For demonstrating replication, multiple VMs per participant = multiple Windows licenses
- Only NobleProg Poland has that, and we can also try Amazon
- Insufficient server resources
- Huge problem for Introduction to SQL Server 2012 Integration Services (SSIS)
- Missing prerequisites
- Problem for MS SQL Server 2016
- Finding trainers that know the internals
- I suspect that PostgreSQL Administration and Development was never delivered in full
- Still present on some frontends, please evaluate and maybe delete
- I suspect that PostgreSQL Administration and Development was never delivered in full
What about NoSQL? ⌘
- Each NoSQL database is unique
- The assumption that the admin already knows the basics no longer holds
- The admin is still not interested how to interact with this thing from Java or Python
- That (and time) is in fact the only difference between Cassandra for Developers and Cassandra Administration
- For Redis, the outline is the same for sysadmins and developers
- Of course delivered differently - using redis-cli (for admins) or Python or C# or ... (for developers)
- Common points do exist
- No joins, no complex queries, result: more performance
- Usually no ACID - replaced with eventual consistency, also for performance
- Probably not good for ad-hoc querying by analysts
Why NoSQL? ⌘
- High availability
- You have to replicate data
- Distributed transactions (needed for ACID) are hard but solved problem
- PAXOS, Raft
- So no actual need to go NoSQL just because of that
- Big data
- If your data doesn't fit on a single server, you have to shard it
- Joining sharded data is fundamentally hard
- Performance
- For simple queries, traditional relational databases spend more time in query parser and planner than actually executing the plan!
- Solution: limit operations to very simple ones, which don't require planning
- Convenience
- Each NoSQL database was created in order to solve a particular problem
- If this is the same as your problem, you win
- E.g. graph databases are a win if you have many N:N relationships - queries are easier to write than in SQL
- It's trendy
- Mostly applies to MongoDB + Node.js, see also "MEAN stack"
- "I know nothing about storing data at all, and Mongo looks simple, so I don't have to understand anything" (source)
- Not a valid reason, see this blog post from 2013
NewSQL movement ⌘
- SQL is a powerful query language
- For database authors, using it means that users already know 50% of their database
- There is no direct conflict between SQL and distributed systems
- Yes, joins are fundamentally slow, but users know, and they need joins, so...
- See a case study from Timescale DB
- "SQL has become the narrow waist for data analysis"
Topics to cover in NoSQL courses ⌘
- Common:
- Data model
- Query language
- Use cases
- Developer:
- Language bindings
- Consistency guarantees
- Performance tips related to specific ways of writing queries
- Admin:
- Replication and sharding
- Performance tuning without rewriting the queries
Available NoSQL databases ⌘
- People sometimes classify them into column-oriented stores, document stores, key-value stores, and so on
- Still there is a lot of variability inside each class
- People sometimes attempt to classify them as CA, CP, or AP
- Reference to Brewer's CAP theorem
- This classification should be viewed as secondary
- The primary classification should be according to their purpose
- Cassandra (good for time-series data, for analytics and aggregation, has SQL-like syntax without joins)
- Redis (data structure server, very low level, all data in memory, very fast, often used for caches and message queues)
- HBase (built on top of Hadoop, so good for Big Data)
- MongoDB (stores and searches JSON documents)
- Solr and Elasticsearch (full text search oriented)
- Missing trainings:
A good overview of the ecosystem (but is this possible to sell? possible at all?)- This outline is close but with strong emphasis on Big Data, and still never delivered
- Tarantool (similar to Redis but better replication and scripting, used in Mail.ru)
- Neo4j (graph database, good for representing connections between arbitrary objects)
Visual guide to NoSQL systems ⌘
Key-Value stores ⌘
- Very simple operations
- SET key value
- GET key
- DELETE key
- Canonical example: memcached
- A bit more complicated example: Redis
- Positioned as a data structure server, not key-value store
- Supports sharding and master-slave replication
- Fast because of simple operations
Cassandra ⌘
- Data model:
- Tables (formerly called "column families"), like in relational databases
- There is schema, but it's cheap to add new columns
- Querying is possible only by keys, not by arbitrary fields
- No joins
- May be possible to emulate, inefficiently, client-side or with MapReduce jobs
- See also Spark
- Consistency model:
- Distributed system, high availability across multiple data centers, no single point of failure
- No ACID
- Lightweight transactions
- Last Write Wins
- Internals:
- Log-structured merge tree, memtables and SSTables, tombstones, immutable disk files
- Use cases:
- Storing and aggregating big data
- Easy to add nodes to the cluster
- Performs best when data is never deleted
- Case study: keeping message history of Discord - why they migrated from MongoDB
- See also this overview
MongoDB ⌘
- Stores JSON documents
- Has a JavaScript query API
- Integrates well with Node.js
- No schema
- So migrations are not 100% necessary
- It's common to store old data in old format, and reinterpret in the application on the fly
- Strongly consistent by default
- Scales horizontally
- but selects Consistency over Availability
Trainer feedback on our Cassandra courses ⌘
- Trainer: Renhart Gittens
- https://www.nobleprog.co.uk/cassandra-developers-course
- No feedback
- https://www.nobleprog.co.uk/cassandra-administration-course
- This is basically the developers course with the Datamodeling labs and Cassandra drivers section omitted to fit it into 2 days. I suppose more emphasis would be placed on the admin components over the 2 days.
- https://www.nobleprog.co.uk/training/fundamentals-cassandra-db
- Should be updated to refer to Cassandra 3.0. This 3-day course is probably superseded by the cassandra-administration-course which drops the reference to Cassandra 2.0 but the extra day gives more opportunity to explore Cassandra internals
- https://www.nobleprog.co.uk/apache-cassandra-2x-core-internals-course
- This course is dated - refers to version 2.x and the use of rackspace and installing into the cloud? While Apache Cassandra 2.2 is still supported its end of life cannot be very far away.
Other troubles with Cassandra courses ⌘
- Only one trouble: system requirements
- Needs 4 GB of RAM by default
- Downscaling is possible, but the trainer is against it
- Would harm the course content by making it impossible to illustrate various database decisions
- Downscaling is possible, but the trainer is against it
- The server in USA has only 32 GB of RAM, so we can't accept more than 6-7 delegates
- Needs 4 GB of RAM by default