Introduction
Terms defined: database, database management system (DBMS), NoSQL database, relational database management system (RDBMS), table
What This Is
- Notes and working examples that instructors can use to perform a lesson
- Do not expect novices with no prior SQL experience to be able to learn from them
- Musical analogy
- This is the chord changes and melody
- We expect instructors to create an arrangement and/or improvise while delivering
- See Teaching Tech Together for background
Scope
- intended audience
- Rachel has a master's degree in cell biology and now works in a research hospital doing cell assays.
- She learned a bit of R in an undergrad biostatistics course and has been through the Carpentries lesson on the Unix shell.
- Rachel is thinking about becoming a data scientist and would like to understand how data is stored and managed.
- Her work schedule is unpredictable and highly variable, so she needs to be able to learn a bit at a time.
- prerequisites
- basic Unix command line:
cd
,ls
,*
wildcard - basic tabular data analysis: filtering rows, aggregating within groups
- basic Unix command line:
- learning outcomes
- Explain the difference between a database and a database manager.
- Write SQL to select, filter, sort, group, and aggregate data.
- Define tables and insert, update, and delete records.
- Describe different types of join and write queries that use them to combine data.
- Use windowing functions to operate on adjacent rows.
- Explain what transactions are and write queries that roll back when constraints are violated.
- Explain what triggers are and write SQL to create them.
- Manipulate JSON data using SQL.
- Interact with a database using Python directly, from a Jupyter notebook, and via an ORM.
Setup
- Download the latest release
- Unzip the file in a temporary directory to create:
./db/*.db
: the SQLite databases used in the examples./src/*.*
: SQL queries, Python scripts, and other source code./out/*.*
: expected output for examples
Background Concepts
- A database is a collection of data that can be searched and retrieved
- A database management system (DBMS) is a program that manages a particular kind of database
- Each DBMS stores data in its own way
- SQLite stores each database in a single file
- PostgreSQL spreads information across many files for higher performance
- DBMS can be a library embedded in other programs (SQLite) or a server (PostgreSQL)
- A relational database management system (RDBMS) stores data in tables
and uses SQL for queries
- Unfortunately, every RDBMS has its own dialect of SQL
- There are also NoSQL databases like MongoDB that don't use tables
Connecting to Database
sqlite3 db/penguins.db
- Not actually a query: starts an interactive session with the database in
db/penguins.db
- Alternative: provide a single query on the command line
sqlite3 database "query"
- Or put query in file and run
sqlite3 database < filename
- Note: the
penguins
database contains two tablespenguins
is all the Palmer Penguins datalittle_penguins
is a subset used in our first few queries to keep output readable
To disconnect from an interactive database session, type Control-D or
.quit
on a line of its own. You may need to type a semi-colon;
to close any unfinished query before SQLite will recognize your attempt to escape.
Acknowledgments
This tutorial would not have been possible without:
- Andi Albrecht's
sqlparse
module - Dimitri Fontaine's The Art of PostgreSQL
- David Rozenshtein's The Essence of SQL (now sadly out of print)
I would also like to thank the following people for spotting issues, making suggestions, or submitting changes:
- Yanina Bellini Saibene
- Phillip Cloud
- Zoe Daniels
- Conor Flynn
- Andy Goldberg
- Jay Graves
- Sam Hames
- Adam Hawkes
- Robert Kern
- Konstantinos Kitsios
- Olivier Leroy
- Kevin Marshall
- Roy Pardee
- Manos Pitsidianakis
- Daniel Possenriede
- Adam Rosien
- Thomas Sandmann
- Simon Willison