More with SQL SELECT
Contents:
1. The ORDER BY
clause
2. Joins
3. Options for WHERE
clause
4. Table aliases
5. Subqueries
6. The HAVING
clause
7. Using NULL
8. Aggregate functions
9. Combining results
We'll examine SQL SELECT queries based on the following three relations for storing information about student grades stored in courses.
students(id, name)
grades(id, num, grade)
sections(num, title, credits, instructor_id)
1. The ORDER BY
clause
A SELECT
query can optionally have at its end an
ORDER BY
clause that sorts the output. The following
shows all students' names in alphabetical order, for example.
SELECT name FROM students ORDER BY name
Or if we want it in reverse order, we would do the following:
SELECT name FROM students ORDER BY name DESC
By default this does case-sensitive ordering (so that Decker would follow DeLeon, since the ASCII value for c is greater than the ASCII value for L.) If we want it a case-insensitive ordering (so that Decker comes before DeLeon), we can sort of the all-capitals version of each name:
SELECT name FROM students ORDER BY UPPER(name) DESC
2. Joins
Suppose we want to show total credits earned by each student (excluding those who never enrolled, and F's don't count for credit):
SELECT name, SUM(credits)
FROM students JOIN grades ON students.id = grades.id
JOIN sections ON grades.num = sections.num
WHERE grade <> 'F'
GROUP BY students.id
Both of these, though, would exclude students who never
completed any courses successfully. To include these students,
we need LEFT JOIN
.
SELECT name, SUM(credits)
FROM students LEFT JOIN
(grades JOIN sections ON grades.num = sections.num)
ON students.id = grades.id
WHERE grade <> 'F'
GROUP BY students.id
3. Options for WHERE
clause
Let's suppose we also want to exclude from our sum courses where the student earns a W (for withdrawing from the course). The simple way is to use:
WHERE grade <> 'F' AND grade <> 'W'
But we could also use NOT IN
:
WHERE grade NOT IN ('F', 'W')
There's also a LIKE
operator that provides basic
regular expressions: Suppose we want to list all CSCI
courses.
SELECT num, title FROM sections WHERE num LIKE 'CSCI%'
A percent sign stands for any sequence of characters, and an underscore stands for any single character. All other characters stand for themselves, though comparison is case-insensitive.
Another somewhat unusual operator is BETWEEN
.
This lists courses whose credit value is between 0.01 and 0.99
(inclusive).
SELECT num, title FROM sections WHERE credits BETWEEN 0.01 AND 0.99
4. Table aliases
Suppose we want to list all students who have taken a course from an instructor named Burch. In this case, we want to join students to grades to sections to students. Since students is listed twice, we need to provide a distinct name to one of them; in this case, we'll use ins to name the second instance of students.
SELECT students.name
FROM students JOIN grades ON students.id = grades.id
JOIN sections ON grades.num = sections.num
JOIN students AS ins ON instructor_id = ins.id
WHERE ins.name = 'Burch'
5. Subqueries
We are allowed to nest one SELECT
query within
another. We can use this to get an alternative way of
listing all students who have taken a course taught by
Burch.
SELECT students.name
FROM students JOIN grades ON students.id = grades.id
WHERE num IN (SELECT num
FROM sections JOIN students ON instructor_id = id
WHERE name = 'Burch')
Or we can include a subquery in the FROM
clause:
SELECT students.name
FROM students JOIN grades ON students.id = grades.id
JOIN (SELECT num
FROM sections JOIN students ON instructor_id = id
WHERE name = 'Burch') AS bsecs
ON sections.num = bsecs.num
6. The HAVING
clause
Suppose we want to show all students who never enrolled in a course. We could do this using subqueries, but an alternative appoach would do a left join of students with grades, grouped by student IDs, and find those students who aren't associated with any grades.
SELECT name
FROM students LEFT JOIN grades ON students.id = grades.id
GROUP BY students.id
HAVING COUNT(grades.id) = 0
(This uses the COUNT
aggregate function, which like
SUM
tallies up a column throughout each group. In the
case of COUNT
, though, it counts up the values
appearing in the group.)
The HAVING
clause is like the WHERE
clause,
except that it is performed after breaking the output into
groups. This allows us to use aggregate functions, which would
not have any meaningful effect in a WHERE
clause.
By the way, including both a WHERE
clause
(before GROUP BY
) and a HAVING
clause
(after GROUP BY
) is typical.
7. Using NULL
What if we want to exclude courses that the student has not
yet completed? We might require that all such courses be marked
with a temporary grade like IP (in progress); but we might also
choose to use a NULL
value for such grades, to be
filled in later once the course is complete.
Suppose we use NULL
. You might be tempted to write
“WHERE grade <> NULL
”. This will not work because
SQL uses three-values logic, whose values are true,
false, and unknown
Nearly anything involving
NULL
results in unknown — even
NULL = NULL
is considered unknown!
And NULL <> NULL
is also unknown!
The reasoning is that NULL
could mean
anything, so a database shouldn't consider it as being equal to
anything or unequal to anything.
A WHERE
clause only selects a row when its value is
true; a value of unknown is treated like
false.
So if we write “WHERE grade <> NULL
”,
in fact we would get no results at all.
To get around this, you would use the special operator
“IS NULL
” or “IS NOT NULL
”.
SELECT name, SUM(credits)
FROM students LEFT JOIN
(grades JOIN sections ON grades.num = sections.num)
ON students.id = grades.id
WHERE grade NOT IN ('F', 'W') AND grade IS NOT NULL
GROUP BY students.id
8. Aggregate functions
We've seen the use of SUM
to add up a column within
each group, and we briefly saw the COUNT
.
These are aggregate functions, which combine cells from a
group. SQL defines several.
AVG(x) |
Yields average of non-NULL values of x in group |
COUNT(x) |
Yields number of non-NULL values of x in group |
COUNT(*) |
Yields number of rows in group |
MAX(x) |
Yields largest value of x in group (ignoring NULL) |
MIN(x) |
Yields largest value of x in group (ignoring NULL) |
SUM(x) |
Yields total of non-NULL values of x in group |
With the exception of COUNT
with an asterisk in the
parentheses, all NULL
values are ignored.
The result of these aggregate functions is never NULL
unless all values in the group are NULL
(COUNT
yields 0 in this case, but the others yield NULL
).
9. Combining results
You can combine the results of two queries using the UNION
operator. Following lists the names of all students who
have earned an A in a CSCI course or who
have taken a course from Burch.
SELECT name
FROM students JOIN grades ON students.id = grades.id
WHERE grade = 'A' AND num LIKE 'CSCI%'
UNION
SELECT name
FROM students JOIN grades ON students.id = grades.id
WHERE num IN (SELECT num
FROM sections JOIN students ON instructor_id = id
WHERE name = 'Burch')
If you use UNION ALL
, then the DBMS will not attempt
to remove duplicates.
You can use INTERSECT
to find the intersection of
two sets. This lists the names of all students have have both
earned an A in a CSCI course and have taken a course
from Burch.
SELECT name
FROM students JOIN grades ON students.id = grades.id
WHERE grade = 'A' AND num LIKE 'CSCI%'
INTERSECT
SELECT name
FROM students JOIN grades ON students.id = grades.id
WHERE num IN (SELECT num
FROM sections JOIN students ON instructor_id = id
WHERE name = 'Burch')
And finally you can use EXCEPT
to list those that
appear in one set but not another. Following lists all students
who received in an A in a CSCI course, but who have not
taken a course from Burch.
SELECT name
FROM students JOIN grades ON students.id = grades.id
WHERE grade = 'A' AND num LIKE 'CSCI%'
EXCEPT
SELECT name
FROM students JOIN grades ON students.id = grades.id
WHERE num IN (SELECT num
FROM sections JOIN students ON instructor_id = id
WHERE name = 'Burch')