CS50X Week 07 Lecture Notes
SQL
Database-centric language.
flat-file database
CSV
Language Poll of Class
import csv
#'r' is default for open
#using 'with' will automatically close the file.
with open("favorites.csv", "r") as file:
reader = csv.reader(file)
#next will skip the header
next(reader)
for row in reader:
print(row[1])
Continue with DictReader
import csv
#'r' is default for open
#using 'with' will automatically close the file.
with open("favorites.csv", "r") as file:
reader = csv.DcitReader(file)
for row in reader:
#this now prevents issues if columns move
#favorite = row["language"]
More elegant way to iterate
import csv
with open("favorites.csv") as file:
reader = csv.DcitReader(file)
#naming the variables, setting the counters
scratch, c, python = 0, 0, 0
for row in reader:
#serach by key
favorite = row["language"]
#check key value
if favorite == "Scratch":
scratch += 1
elif favorite == "C":
c += 1
elif favorite == "Python":
python += 1
print(f"Scratch: {scratch}")
print(f"C: {c}")
print(f"Python: {python}")
Refactored to be more pythonic
import csv
with open("favorites.csv") as file:
reader = csv.DcitReader(file)
#empty dic created
counts = {}
#smae loop
for row in reader:
favorite = row["language"]
#catch expceted condition and add to count
if favorite in counts:
counts[favorite] =+ 1
#create count to begin
else:
counts[favorite] = 1
for favorite in counts:
print(f"{favorite}: {counts[favorite]}")
You can sort the dictionary easily with sorted()
for favorite in sorted(counts):
print(f"{favorite}: {counts[favorite]}")
Sort by key’s value in this case numerically
for favorite in sorted(counts, key=counts.get):
print(f"{favorite}: {counts[favorite]}")
To change it to descending order use reverse
for favorite in sorted(counts, key=counts.get, reverse=True):
print(f"{favorite}: {counts[favorite]}")
Collections Library
import csv
from collections import Counter
with open("favorites.csv") as file:
reader = csv.DcitReader(file)
#objcet with built in counting functionality
counts = Counter()
for row in reader:
favorite = row["language"]
counts[favorite] += 1
#comes with built in ability to return pairs key value
for favorite, count in counts.most_common():
print(f"{favorite}: {counts}")
can change to “problem” easily
import csv
from collections import Counter
with open("favorites.csv") as file:
reader = csv.DcitReader(file)
#objcet with built in counting functionality
counts = Counter()
for row in reader:
#this is all that changes since the program is written so modularly.
favorite = row["problem"]
counts[favorite] += 1
#comes with built in ability to return pairs key value
for favorite, count in counts.most_common():
print(f"{favorite}: {counts}")
Relational Database
C.R.U.D.
- Create, insert
- Select
- Update
- Delete, drop
CREATE TABLE table (column type, …);
Sqlite3
Popular database option for smaller programs.
$ sqllite3 favorites.db
sqlite>
.mdoe csv
.import favorites.csv favorites
.quit
#shows the database information
.schema
# '*' is a whildcard (meaning it stands in for the column key)
SELECT * FROM favorites;
#makes sure that only 10 resutls come back.
SELECT language FROM favorites LIMIT 10;
Built in phrases
- AVG
- COUNT
- DISTINCT
- LOWER
- MAX
- MIN
- UPPER
- …
returns the number of rows
SELECT COUNT(\*) FROM favorites;
returns every value in column languages
SELECT DISTINCT(language) FROM favorites;
returns the number of DISTINCT values in column languages
SELECT COUNT(DISTINCT(language)) FROM favorites;
More keywords
- WHERE
- LIKE
- ORDER BY
- LIMIT
- GROUP BY
returns the count of only C within coloumn language
SELECT COUNT(*) FROM favorites WHERE language = 'C';
returns the count of only C within coloumn language AND also match Hello, World within coloumn problem
SELECT COUNT(\*) FROM favorites WHERE language = 'C' AND problem = 'Hello, World';
returns a table of languages and their associated count
SELECT language, COUNT(*) FROM favorites GROUP BY language;
returns a table of languages and their associated count; now ordered by accending order.
SELECT language, COUNT(_) FROM favorites GROUP BY language ORDER BY COUNT(_);
returns a table of languages and their associated count; now ordered by descending order.
SELECT language, COUNT(*) FROM favorites GROUP BY language ORDER BY COUNT(*) DESC;
Can name a condition to save repeats
SELECT language, COUNT(\*) AS n FROM favorites GROUP BY language ORDER BY n DESC;
Add data into the database
INSERT INTO table (column, ...) VALUES(value, ...);
NULL == lack of data for field.
IMDb
Movie/TV Show database with a GUI as the website
Best Practice
when storing data associate a string header with an id number. That way ids can be associated across multiple pieces of data and not risk a string being misspelled.
Data Types
- BLOB
- Binary Large Object
- INTEGER
- NUMBERIC
- dates and times
- REAL
- floats or decimal points
- TEXT
- strings
PRIMARY KEY = unique id FOREIGN KEY = use of PRIMARY KEY in a new database table.
Search data matched between two tables
SELECT \* FROM shows WHERE id IN
(SELECT show_id FROM ratings WHERE rating >= 6.0);
JOIN
This will combine the two tables by defining the shared value “shows.id” and then outputs the first 10 options with a rating >= 6.0, and only show the title and rating.
SELECT title, rating FROM shows JOIN ratings ON shows.id = ratings.show.id WHERE rating >= 6.0 LIMIT 10;
Many to Many
requires a 3rd table to link the unique data tables “Person” “Shows”
INDEX
INDEX will create a B-Tree; and while it takes time to process the table the first time; there after it will result in far faster searches between tables.
CREATE INDEX person_index on stars (person_id);
Tradeoffs: will cost more memory, and has to be updated anytime the table is modified.
Python to use SQL
form cs50 import SQL
db = SQL("sqlite:///favorites.db")
favorite = input("Favorite: ")
rows = db.execute("SELECT COUNT(*) AS n FROM favorites WHERE problem = ?", favorite)
row = rows[0]
print(row["n"])
Race Conditions
where you attempt to update a database multiple times within a short period of time, and there is a conflict.
- BEGIN TRANSACTION
- COMMIT
- ROLLBACK
These help control and prevent conflicts where race conditions are a known possibility.
db.execute("BEGIN TRANSACTION")
rows = db.execute("SELECT likes FROM posts WHERE id = ?", id);
likes = rows[0]["likes"]
db.execute("UPDATE posts SET likes = ? WHERE id = ?", likes + 1, id);
db.execute("COMMIT")
SQL Injection ATTACK
Safe solution is to use ‘?’ as a placeholder when a variable needs to be entered.