Level Up Your SQL: A Practical Guide to Intermediate SQL

Intermediate sqlSqlProgramming

Posted On: 2025-September-09

10 Minutes Read

Author: jack frost

So, you've mastered the basics of SQL – SELECT, FROM, WHERE, JOIN – and you're ready to take the next step. That's fantastic! This article is your guide to **Intermediate SQL**, designed to equip you with the skills you need to impress employers, tackle complex data challenges, and truly unlock the power of databases. We'll cover essential topics, including window functions and more, with a focus on practical application and clear explanations.

Window Functions: Analyzing Data Across Rows

Window functions are a game-changer in SQL. They allow you to perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions (like SUM, AVG, COUNT), which collapse rows into a single result, window functions retain the original row's granularity while providing additional insights.

Understanding the Difference

Imagine you want to calculate the moving average of sales over the past three months. An aggregate function could only give you the average sales across *all* months. A window function, however, would let you calculate the average for each month, considering the preceding two months.

Key Window Functions

  • ROW_NUMBER(): Assigns a unique sequential integer to each row within a partition.
  • RANK(): Assigns a rank to each row within a partition, with gaps in the ranking if there are ties.
  • DENSE_RANK(): Similar to RANK(), but assigns consecutive ranks without gaps, even if there are ties.
  • LAG(column, n, default): Accesses data from a previous row (n rows back) within the partition. You can specify a default value if there's no row n rows back.
  • LEAD(column, n, default): Accesses data from a subsequent row (n rows forward) within the partition.  You can specify a default value if there's no row n rows forward.

Practical Examples of Window Functions

Let's say you have a table called `sales` with columns `sale_date`, `salesperson`, and `amount`.

Calculating Running Totals

To calculate the running total of sales by salesperson:

SELECT
   sale_date,
   salesperson,
   amount,
   SUM(amount) OVER (PARTITION BY salesperson ORDER BY sale_date) AS running_total
FROM
   sales;
 

This query partitions the data by `salesperson` and orders it by `sale_date`. The `SUM(amount) OVER (...)` calculates the cumulative sum of the `amount` for each salesperson as the `sale_date` progresses.

Identifying Top Performers

To identify the top 3 salespeople each month:

WITH RankedSales AS (
   SELECT
       EXTRACT(MONTH FROM sale_date) AS sale_month,
       salesperson,
       SUM(amount) AS total_sales,
       RANK() OVER (PARTITION BY EXTRACT(MONTH FROM sale_date) ORDER BY SUM(amount) DESC) AS sales_rank
   FROM
       sales
   GROUP BY
       EXTRACT(MONTH FROM sale_date),
       salesperson
)
SELECT
   sale_month,
   salesperson,
   total_sales
FROM
   RankedSales
WHERE
   sales_rank <= 3;
 

This example uses a CTE (Common Table Expression, explained later) to first rank salespeople by total sales within each month, and then selects only the top 3 from each month.

Comparing Values Across Rows

To calculate the difference in sales between consecutive days for each salesperson:

SELECT
   sale_date,
   salesperson,
   amount,
   amount - LAG(amount, 1, 0) OVER (PARTITION BY salesperson ORDER BY sale_date) AS sales_difference
FROM
   sales;

This query uses `LAG()` to access the previous day's sales amount. The `1` specifies that we want to look back one row, and the `0` provides a default value for the first day, where there's no previous day to compare to.

Subqueries: Queries Within Queries

Subqueries are queries nested inside another query. They're powerful for filtering and retrieving data based on the results of another query.

Correlated vs. Non-Correlated Subqueries

  • Non-Correlated Subqueries: These are independent queries that can be executed on their own. The outer query uses the result of the subquery.
  • Correlated Subqueries: These subqueries depend on the outer query's data. They are executed once for each row in the outer query.

Use Cases with Examples

Non-Correlated Subquery

Find all customers who placed orders with amounts greater than the average order amount:

SELECT
   customer_id,
   order_id,
   amount
FROM
   orders
WHERE
   amount > (SELECT AVG(amount) FROM orders);
 

The subquery calculates the average order amount only once, and the outer query uses this result to filter the orders.

Correlated Subquery

Find all orders placed by customers who have placed at least two orders:

SELECT
   customer_id,
   order_id,
   amount
FROM
   orders AS o1
WHERE EXISTS (
   SELECT 1
   FROM orders AS o2
   WHERE o2.customer_id = o1.customer_id
   GROUP BY o2.customer_id
   HAVING COUNT(*) >= 2
);

The subquery is executed for each row in the outer `orders` table (aliased as `o1`). It checks if the customer associated with the current row has at least two orders in the `orders` table (aliased as `o2`).

Performance Considerations

Correlated subqueries can be slow, especially with large datasets, because they are executed for each row of the outer query.  Optimizing correlated subqueries often involves rewriting them using JOINs or CTEs. Non-correlated subqueries are generally faster because they are executed only once.

Common Table Expressions (CTEs): Simplifying Complex Queries

CTEs are temporary, named result sets that you can define within a single query.  They make complex queries more readable and maintainable by breaking them down into smaller, logical units.

Benefits of Using CTEs

  • Improved Readability: Break down complex logic into smaller, more manageable steps.
  • Increased Maintainability: Easier to understand and modify queries.
  • Recursive Queries: Allow you to query hierarchical data structures (e.g., organizational charts).

Examples of Using CTEs

Simplifying Complex Queries

Let's revisit the example of identifying top performers each month. Using a CTE, we make the code more readable:

WITH MonthlySales AS (
   SELECT
       EXTRACT(MONTH FROM sale_date) AS sale_month,
       salesperson,
       SUM(amount) AS total_sales
   FROM
       sales
   GROUP BY
       EXTRACT(MONTH FROM sale_date),
       salesperson
),
RankedSales AS (
   SELECT
       sale_month,
       salesperson,
       total_sales,
       RANK() OVER (PARTITION BY sale_month ORDER BY total_sales DESC) AS sales_rank
   FROM
       MonthlySales
)
SELECT
   sale_month,
   salesperson,
   total_sales
FROM
   RankedSales
WHERE
   sales_rank <= 3;
 

This example breaks the query into two CTEs: `MonthlySales` calculates the total sales for each salesperson each month, and `RankedSales` ranks the salespeople within each month based on their sales. The final SELECT statement retrieves the top 3 salespeople.

Recursive CTEs

Recursive CTEs are used to traverse hierarchical data. Imagine an `employees` table with `employee_id`, `employee_name`, and `manager_id` columns.  A recursive CTE can be used to find all employees reporting to a specific manager, directly or indirectly.

WITH RECURSIVE EmployeeHierarchy AS (
   SELECT
       employee_id,
       employee_name,
       manager_id,
       1 AS level
   FROM
       employees
   WHERE
       manager_id IS NULL -- Start with the top-level manager
   UNION ALL
   SELECT
       e.employee_id,
       e.employee_name,
       e.manager_id,
       eh.level + 1
   FROM
       employees AS e
   JOIN
       EmployeeHierarchy AS eh ON e.manager_id = eh.employee_id
)
SELECT
   employee_id,
   employee_name,
   level
FROM
   EmployeeHierarchy
ORDER BY
   level;
 

This CTE starts with employees who have no manager (the top-level managers).  Then, it recursively joins the `employees` table to the `EmployeeHierarchy` CTE to find employees reporting to those managers, and so on, until all employees have been added to the hierarchy.

Advanced JOIN Techniques

Beyond the basics of INNER, LEFT, and RIGHT JOIN, there are other JOIN types that offer more flexibility in retrieving data from multiple tables.

FULL OUTER JOIN and CROSS JOIN

  • FULL OUTER JOIN: Returns all rows from both tables. If there's no match, NULL values are returned for the unmatched columns.
  • CROSS JOIN: Returns the Cartesian product of the two tables, meaning every row from the first table is combined with every row from the second table. Use with caution, as it can generate very large result sets.

Self-Joins

A self-join joins a table to itself. This is useful when you need to compare rows within the same table, such as finding employees who earn more than their manager.

Example: FULL OUTER JOIN

 

Suppose you have a `customers` table and an `orders` table. To get all customers and all orders, regardless of whether they have a match, you would use a FULL OUTER JOIN:

SELECT
   c.customer_id,
   c.customer_name,
   o.order_id,
   o.order_date
FROM
   customers AS c
FULL OUTER JOIN
   orders AS o ON c.customer_id = o.customer_id;
 

Example: CROSS JOIN

To generate all possible combinations of products and colors, you could use a CROSS JOIN. Let's assume you have `products` and `colors` tables.

SELECT
   p.product_name,
   c.color_name
FROM
   products AS p
CROSS JOIN
   colors AS c;
 

Example: Self-Join

 

To find employees who earn more than their manager (assuming an `employees` table with `employee_id`, `employee_name`, `salary`, and `manager_id` columns):

SELECT
   e.employee_name AS employee,
   m.employee_name AS manager
FROM
   employees AS e
JOIN
   employees AS m ON e.manager_id = m.employee_id
WHERE
   e.salary > m.salary;
 

Pivoting and Unpivoting: Transforming Data

Pivoting transforms rows into columns, while unpivoting does the opposite, transforming columns into rows. These techniques are useful for reshaping data to make it easier to analyze and present.

Unfortunately, standard SQL doesn't have universally supported PIVOT and UNPIVOT commands. The implementation depends on the database system (e.g., SQL Server, Oracle). However, the logic is generally achieved using CASE statements or other conditional aggregation techniques.

Example: Pivoting with CASE Statements

Suppose you have a `sales` table with `product_category`, `month`, and `sales_amount` columns. To pivot the data to show sales amounts for each month as columns, you could use a CASE statement:

SELECT
   product_category,
   SUM(CASE WHEN month = 'January' THEN sales_amount ELSE 0 END) AS January,
   SUM(CASE WHEN month = 'February' THEN sales_amount ELSE 0 END) AS February,
   SUM(CASE WHEN month = 'March' THEN sales_amount ELSE 0 END) AS March
FROM
   sales
GROUP BY
   product_category;
 

Example: Unpivoting with UNION ALL

To unpivot the pivoted data back into its original format, you can use UNION ALL:

SELECT
   product_category,
   'January' AS month,
   January AS sales_amount
FROM
   PivotedSales
UNION ALL
SELECT
   product_category,
   'February' AS month,
   February AS sales_amount
FROM
   PivotedSales
UNION ALL
SELECT
   product_category,
   'March' AS month,
   March AS sales_amount
FROM
   PivotedSales;
 

Transactions and Concurrency Control

Transactions are a sequence of database operations treated as a single logical unit of work. If any operation fails, the entire transaction is rolled back, ensuring data consistency.

ACID Properties

  • Atomicity: All operations in a transaction are treated as a single unit. Either all succeed, or all fail.
  • Consistency: A transaction must maintain the database's integrity constraints.
  • Isolation: Transactions should be isolated from each other to prevent interference.
  • Durability: Once a transaction is committed, its changes are permanent.

 

Using Transactions

BEGIN TRANSACTION;

UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;

COMMIT; -- Or ROLLBACK;
 

If an error occurs between the UPDATE statements, you would use `ROLLBACK` to undo the changes made by the first UPDATE, maintaining data integrity.

Concurrency Control

Concurrency control mechanisms, like locking, prevent multiple transactions from interfering with each other.  Databases typically use various types of locks (e.g., shared locks, exclusive locks) to manage access to data.

Mastering **Intermediate SQL** is crucial for any aspiring data professional. You've now gained valuable insights into window functions, subqueries, CTEs, advanced JOINs, pivoting/unpivoting, and transactions. These skills will allow you to write more efficient, maintainable, and powerful SQL queries. Keep practicing, keep experimenting, and you'll be well on your way to becoming a SQL expert. Remember that even **intermediate SQL, i.e., window functions and more**, builds upon fundamental knowledge, so regular practice will solidify your understanding.

 

copyright © 2026. thehyperanalytics.com