Optimizing SQL Queries to Retrieve Maximum Salary per Department

Subquery Solution for Selecting Max Salary per Department in a Single Table

When working with large datasets, it’s common to encounter situations where we need to extract specific information from a table while aggregating data. In this case, we’re interested in selecting the maximum salary for each department from the EMPLOYEES table.

Problem Statement

The provided SQL query aims to achieve this by grouping the data by department_id and then using the MAX function to select the highest salary within each group.

SELECT 
  e.department_id, 
  t.employee_id as id,
  t.first_name || ' ' || t.last_name as name,
  e.maxsalary 
FROM (
  SELECT 
    department_id, 
    MAX(salary) as maxsalary 
  FROM 
    EMPLOYEES
  GROUP BY 
    department_id 
) e
INNER JOIN 
  EMPLOYEES t 
ON 
  t.department_id = e.department_id and t.salary = e.maxsalary
ORDER BY e.department_id;

However, this approach is flawed due to the incorrect assumption that t.salary will always match e.maxsalary. In reality, this won’t work as intended because there might be multiple employees with the same salary in a department.

Correct Solution

To correctly extract the maximum salary for each department, we can use a subquery that calculates the maximum salary within each group and then joins the result with the original table.

SELECT 
  e.department_id, 
  t.employee_id as id,
  t.first_name || ' ' || t.last_name as name,
  t.salary as maxsalary 
FROM (
  SELECT 
    department_id, 
    MAX(salary) as maxsalary 
  FROM 
    EMPLOYEES
  GROUP BY 
    department_id 
) e
INNER JOIN 
  EMPLOYEES t 
ON 
  t.department_id = e.department_id AND t.salary = e.maxsalary;

However, the above query won’t give us the expected results. This is because when we use MAX(salary) in the subquery, it will return NULL if there are no employees with a salary in that department.

Revised Solution

To fix this issue, we can modify the query to use a ROW_NUMBER() or RANK() function to assign ranks to each employee within their respective departments based on their salaries. We then select only those employees who have been ranked as 1 (i.e., the maximum salary in that department).

SELECT 
  e.department_id, 
  t.employee_id as id,
  t.first_name || ' ' || t.last_name as name,
  t.salary as maxsalary 
FROM (
  SELECT 
    department_id, 
    employee_id, 
    first_name, 
    last_name, 
    salary, 
    ROW_NUMBER() OVER(PARTITION BY department_id ORDER BY salary DESC) AS salary_rank
  FROM 
    EMPLOYEES
) e
INNER JOIN 
  EMPLOYEES t 
ON 
  t.department_id = e.department_id AND t.employee_id = e.employee_id AND t.salary_rank = 1;

However, this approach may not be efficient for large datasets because it requires a temporary result set.

More Efficient Solution

A more efficient solution is to use a subquery that returns only the maximum salary for each department and then join this result with the original table.

SELECT 
  e.department_id, 
  t.employee_id as id,
  t.first_name || ' ' || t.last_name as name,
  e.salary as maxsalary 
FROM (
  SELECT 
    department_id, 
    MAX(salary) as salary 
  FROM 
    EMPLOYEES
  GROUP BY 
    department_id 
) e
INNER JOIN 
  EMPLOYEES t 
ON 
  t.department_id = e.department_id AND t.salary = e.salary;

Optimized Solution

To further optimize the query, we can use a JOIN instead of an INNER JOIN. The reason for this is that we want to retrieve all employees who are ranked as 1 in their respective departments.

SELECT 
  e.department_id, 
  t.employee_id as id,
  t.first_name || ' ' || t.last_name as name,
  t.salary as maxsalary 
FROM (
  SELECT 
    department_id, 
    MAX(salary) as salary 
  FROM 
    EMPLOYEES
  GROUP BY 
    department_id 
) e
JOIN (
  SELECT 
    department_id, 
    employee_id, 
    first_name, 
    last_name, 
    salary, 
    ROW_NUMBER() OVER(PARTITION BY department_id ORDER BY salary DESC) AS salary_rank
  FROM 
    EMPLOYEES
) t
ON 
  t.department_id = e.department_id AND t.salary_rank = 1;

This solution assumes that there will only be one employee with the maximum salary in each department.

Real-World Example

Let’s consider an example where we have an EMPLOYEES table with the following data:

+------------+--------------+---------+----------+
| EMPLOYEE_ID | DEPARTMENT_ID | SALARY  | FIRST_NAME | LAST_NAME |
+============+==============+=========+==========+==========+
|           1 |              1 |    10000 | A         | B        |
|           2 |              1 |    20000 | C         | D        |
|           3 |              1 |    30000 | E         | F        |
|           4 |              2 |    40000 | G         | H        |
|           5 |              2 |    50000 | I         | J        |
+------------+--------------+---------+----------+

When we run the optimized solution, we get the following result:

+------------+-----------+-------------+--------+
| DEPARTMENT_ID | EMPLOYEE_ID | NAME       | SALARY |
+============+===========+=============+========+
|              1 |           3 | E F        | 30000 |
|              2 |           5 | I J        | 50000 |
+------------+-----------+-------------+--------+

In this example, we see that the employee with EMPLOYEE_ID = 3 has a salary of $30,000 in department 1, and the employee with EMPLOYEE_ID = 5 has a salary of $50,000 in department 2.


Last modified on 2024-05-30