Photo by Caspar Camille Rubin on Unsplash
Photo by Caspar Camille Rubin on Unsplash

Early in my career, I found myself learning SQL on the fly. Specifically, I was given access to a database and told, “use the data here to find the answers.” Almost every experienced analyst I know has a similar story. They are thrown into the proverbial deep end that is query structure, something that their college or graduate education did not spend a lot of time on.

SQL, like so many things in the analyst world, takes minutes to learn, but can take an entire career to master. In this post, I’ll focus on the basic structure of a SQL query and provide some techniques to help understand it. While not  exhaustive, this post provides a foundation on which to build your SQL skill set.

Start at the Bottom

SQL has only two mandatory clauses: SELECT and FROM. The most common optional clause is the WHERE clause, but this last part may vary for you, since different databases require different techniques. You may already be familiar with the following basic query structure:

SELECT customer_name, segment, sales
FROM superstore_sample
WHERE state = ‘Alabama’

In SQL, arguably the two most important clauses, FROM and WHERE, are pushed toward the end of the statement. These clauses are critically important because they create context. It doesn’t make sense to look at the SELECT clause if you don’t know where those columns come FROM. Considering this, we should read the clauses in this order: FROM, WHERE, SELECT. Added bonus: this is also the order in which the machine executes your SQL.

Say It in a (Weird) Sentence

As mentioned above, a good way to test your understanding of a query is saying it in English. Let’s  convert our query into a one-sentence summary: 

We need superstore sample records from Alabama only, for each category and segment combination returning the sum of sales for each, sorted by segment.

Here’s how this sentence breaks down:

SQL SnippetEnglish TranslationWhat’s Actually Happening
FROM superstore_sampleWe need superstore sample recordsPull from the superstore_sample table only.
WHERE state = ‘Alabama’from Alabama onlyNarrow from all records to just those from Alabama.
SELECT customer_name, segment, salesspecifically the customer name, segment, and sales from each.Return the specified columns only.

 This might sound awkward in everyday speech, but narrowing your context from the start, and translating that process into English, is good practice for any code that you write. You can, of course, also phrase this query as follows:

We need the customer name, segment, and sales from the superstore sample, focused only on Alabama.

This feels less clunky and generally more conversational, but it has two disadvantages:

  1. It ignores the order in which the machine actually works.
  2. Important context is left dangling at the end of the sentence, requiring you to revert your thinking to context after we’ve gotten to what feels like the most important part.

A More Complex Example of Translating SQL

We can extend this technique to a more complex query such as:

SELECT category, segment, SUM(sales) AS sales
FROM superstore_sample
WHERE state = ‘Alabama’
GROUP BY category, segment
ORDER BY segment

This query is a bit more complex, but we can still transform it into a single sentence:

We need superstore sample records from Alabama only, for each category and segment combination returning the sum of sales for each, sorted by segment.

A bit of a mouthful, but we got through it. Let’s break it down below:

SQL SnippetEnglish translationWhat’s Actually Happening
FROM superstore_sampleWe need superstore sample recordsNarrow from all possible tables to one.
WHERE state = ‘Alabama’from Alabama onlyNarrow from all records to just those from Alabama.
GROUP BY category, segmentfor each category and segment combinationAggregate records along category and segment.
SELECT category, segment, SUM(sales)returning the sum of sales for eachDisplay the category, segment, and sum of sales for each combination.
ORDER BY segmentsorted by segment.Order the results by segment in alphabetical order.

Admittedly, this breaks the “start at the bottom” convention a little, but retains the order in which the clauses actually executes as follows:

  1. FROM
  2. WHERE
  3. GROUP BY 
  4. SELECT
  5. ORDER BY

SELECT main_takeaway FROM blog_post

It isn’t necessary to do this exercise every time, but taking time to understand what the SQL engine is doing can help you optimize even the most complex of queries. As a bonus, it provides a sentence that can be placed directly in an email to a stakeholder to help explain in plain English how data is retrieved and processed to create a particular dataset.

Written By


John Carney

John Carney, Senior Analyst, Data Operations guides clients through the process of turning data access into data-driven insight. His primary specialty involves blending skills in data science, data engineering, and data visualization to generate reliable and action-focused solutions. With this well-rounded set of skills, John often acts as a bridge between the Data Operations and Decision Science teams at Evolytics. A believer in the power of data to empower all, John occasionally speaks at conferences on topics such as building cross-platform pipeline automation and enabling data science team collaboration through system standardization.