Databases and SQL

Sorting and Removing Duplicates

Learning Objectives

Write queries that display results in a particular order.
Write queries that eliminate duplicate values from data.

Data is often redundant, so queries often return redundant information. For example, if we select the quantitites that have been measured from the survey table, we get this:

select quant from Survey;

quant
rad
sal
rad
sal
rad
sal
temp
rad
sal
temp
rad
temp
sal
rad
sal
temp
sal
rad
sal
sal
rad

We can eliminate the redundant output to make the result more readable by adding the distinct keyword to our query:

select distinct quant from Survey;

quant
rad
sal
temp

If we select more than one column --- for example, both the survey site ID and the quantity measured --- then the distinct pairs of values are returned:

select distinct taken, quant from Survey;

taken	quant
619	rad
619	sal
622	rad
622	sal
734	rad
734	sal
734	temp
735	rad
735	sal
735	temp
751	rad
751	temp
751	sal
752	rad
752	sal
752	temp
837	rad
837	sal
844	rad

Notice in both cases that duplicates are removed even if they didn't appear to be adjacent in the database. Again, it's important to remember that rows aren't actually ordered: they're just displayed that way.

As we mentioned earlier, database records are not stored in any particular order. This means that query results aren't necessarily sorted, and even if they are, we often want to sort them in a different way, e.g., by the name of the project instead of by the name of the scientist. We can do this in SQL by adding an order by clause to our query:

select * from Person order by ident;

ident	personal	family
danfort	Frank	Danforth
dyer	William	Dyer
lake	Anderson	Lake
pb	Frank	Pabodie
roe	Valentina	Roerich

By default, results are sorted in ascending order (i.e., from least to greatest). We can sort in the opposite order using desc (for "descending"):

select * from person order by ident desc;

ident	personal	family
roe	Valentina	Roerich
pb	Frank	Pabodie
lake	Anderson	Lake
dyer	William	Dyer
danfort	Frank	Danforth

(And if we want to make it clear that we're sorting in ascending order, we can use asc instead of desc.)

We can also sort on several fields at once. For example, this query sorts results first in ascending order by taken, and then in descending order by person within each group of equal taken values:

select taken, person from Survey order by taken asc, person desc;

taken	person
619	dyer
619	dyer
622	dyer
622	dyer
734	pb
734	pb
734	lake
735	pb
735	-null-
735	-null-
751	pb
751	pb
751	lake
752	roe
752	lake
752	lake
752	lake
837	roe
837	lake
837	lake
844	roe

This is easier to understand if we also remove duplicates:

select distinct taken, person from Survey order by taken asc, person desc;

taken	person
619	dyer
622	dyer
734	pb
734	lake
735	pb
735	-null-
751	pb
751	lake
752	roe
752	lake
837	roe
837	lake
844	roe

FIXME

Write a query that selects distinct dates from the Site table.

FIXME

Write a query that returns the distinct dates in the Visited table.

FIXME

Write a query that displays the full names of the scientists in the Person table, ordered by family name.