SQL Joins - CSU2101 | Shoolini University

SQL Joins

This document provides an in-depth explanation of the topics covered in class and is designed to be your go-to reference as you continue to build your knowledge and skills.

Introduction to SQL Joins

What: Query based on multiple tables is called Joins.

How: Joins combine rows from two or more tables based on related columns.

Why: Learning joins allows you to write complex queries easily and efficiently.

Real World Analogy: Think of joins like matching pieces from two different jigsaw puzzles based on shared shapes—they help complete the bigger picture.

Types of Joins

SQL has primarily five join types:

  1. Inner Join (Equi Join)
  2. Outer Join (Left, Right, Full)
  3. Self Join
  4. Cross Join (Cartesian Join)
  5. Natural Join
graph TD
Joins["Joins = Combine rows from multiple tables based on related columns"]
Joins --> Inner["Inner Join: Only matching rows"]
Inner --> IQ["SELECT * FROM emp JOIN dept ON emp.deptno = dept.deptno;"]

Joins --> Outer["Outer Join: All rows + NULL for unmatched"]
Outer --> Left["Left Join: All from left"]
Left --> LQ["SELECT * FROM emp LEFT JOIN dept ON emp.deptno = dept.deptno;"]
Outer --> Right["Right Join: All from right"]
Right --> RQ["SELECT * FROM emp RIGHT JOIN dept ON emp.deptno = dept.deptno;"]
Outer --> Full["Full Join: All from both"]
Full --> FQ["SELECT * FROM emp FULL JOIN dept ON emp.deptno = dept.deptno;"]

Joins --> Self["Self Join: Table joined to itself"]
Self --> SQ["SELECT E.ename, M.ename FROM emp E JOIN emp M ON E.mgr = M.empno;"]

Joins --> Cross["Cross Join: All combinations"]
Cross --> CQ["SELECT * FROM T1 CROSS JOIN T2;"]

Joins --> Natural["Natural Join: Auto match same column names"]
Natural --> NQ["SELECT * FROM emp NATURAL JOIN dept;"]
                

Let us first know why we need joins before learning about each type.

Why do we need Joins?

There are limitations which simple SELECT statements have. The following table also serves as the structure and relationship of the tables involved in some of the future examples:

dept Table - Columns: deptno (PK), dname, loc — defines department info.

emp Table - Columns: empno (PK), ename, mgr, hiredate, sal, comm, deptno (FK) — holds employee records.

Relation:

Analogy

Basic Queries Without Joins

These only work with a single table:

SELECT * FROM emp WHERE comm IS NOT NULL;
SELECT COUNT(*) FROM dept WHERE loc = 'LA';

Queries That Require Joins (Can't Be Done Without Them)

These queries pull related data across tables and require JOINs:

1. Inner Join (Equi Join)

Definition: Returns rows with matching values in both tables.

Real-World Scenerio: Imagine matching employees with the departments they actually work in - like checking who is assigned to which office branch. Only valid employee-department pairs are shown.

Example:

emp table

empno ename deptno
101 Alice 10
102 Bob 20
103 Carol 30
104 Dave 40

dept table

deptno dname loc
10 Sales LA
20 HR NY
30 Marketing SF

How to Fetch

SELECT emp.ename, dept.dname
FROM emp INNER JOIN dept
ON emp.deptno = dept.deptno;

Alternate syntax:

SELECT emp.ename, dept.dname
FROM emp, dept
WHERE emp.deptno = dept.deptno;
Output
ename dname
Alice Sales
Bob HR
Carol Marketing

In this case, it returns rows where deptno exists in both emp and dept tables.

👉 Dave is excluded because his deptno (40) has no match in dept.

Note: If either table is missing matching data (deptno), rows will not appear in the result.

Workflow:

graph LR;
emp[emp table] -- deptno match --> dept[dept table];
Result[Matched Rows Only] --> Display;

2. Outer Join

Definition: Returns all rows from one table and matching rows from the other. Unmatched rows show NULL for the missing side.

Real-World Scenerio: Imagine you're listing all employees, whether they have a department assigned or not. If no match is found, you still show the employee—but leave the department blank (NULL). Like checking attendance even if someone doesn't have an assigned seat.

For example:

emp table

ename deptno
Alice 10
Bob 20
Carol 30
Dave 40

dept table

deptno dname
10 Sales
20 HR
30 Marketing
Types of Outer Joins:

🔹 Left Outer Join All rows from left table (first table) plus matching rows from right table. In our case, all employees with their department names if available.

SELECT emp.ename, dept.dname
FROM emp LEFT OUTER JOIN dept
ON emp.deptno = dept.deptno;
Output
ename dname
Alice Sales
Bob HR
Carol Marketing
Dave NULL

Workflow (Left Outer Join):

graph LR;
emp[All Employees] --> Result;
dept[Matched Departments] -- join on deptno --> Result;

🔸 Right Outer Join All rows from right table (second table) plus matching rows from left table. In our case, all departments with employee names if assigned.

SELECT emp.ename, dept.dname
FROM emp RIGHT OUTER JOIN dept
ON emp.deptno = dept.deptno;
Output
ename dname
Alice Sales
Bob HR
Carol Marketing
NULL Admin

Full Outer Join Combines results from both tables completely. In our case, all employees and all departments, including unmatched from both sides.

SELECT emp.ename, dept.dname
FROM emp FULL OUTER JOIN dept
ON emp.deptno = dept.deptno;
Output
ename dname
Alice Sales
Bob HR
Carol Marketing
Dave NULL
NULL Admin

📝 Use outer joins when you must not miss any data—even if there's no corresponding match.

3. Self Join

Definition: Joining a table to itself using aliases.

A self join is when a table is joined to itself using aliases, treating it as two separate tables logically.

Real-World Scenerio: Imagine a classroom where every student is assigned a Class Representative (CR) - who is also a student. You're matching a student with another student from the same list. That’s a self join!

Student Table:

Sid Sname cr_id
1 A 1
2 B 1
3 C 3
4 D 4
5 E 4

Query:

Identify each student along with their class representative:

SELECT S1.Sname AS Student_Name, S2.Sname AS CR_Name
FROM Student S1, Student S2
WHERE S1.cr_id = S2.Sid;
Output
Student_Name CR_Name
A A
B A
C C
D D
E D

Here:

Key Point:

Aliases (S1, S2) are mandatory in self joins to treat the table as two logical entities.

Workflow:

graph LR;
Student--as S1-->S1[Student];
Student--as S2-->S2[CR];
S1--Join cr_id = Sid-->S2;

📝 Self joins are super useful for representing hierarchies like employees → managers, students → CRs, or tasks → parent tasks.

4. Cross Join (Cartesian Join)

Definition: Combines each row from one table with every row from another.

Returns all possible combinations between rows of two tables. Does not require any condition.

Real-World Scenerio: Imagine pairing every shirt with every pair of pants to see all possible outfit combinations.

Practical Example Data:

T1 (Size) T2 (Color)
S Red
M Blue
SELECT Size, Color FROM T1 CROSS JOIN T2;
Output
Size Color
S Red
S Blue
M Red
M Blue

✅ When to Use:

❌ When to Avoid:

Workflow:

graph LR;
T1[All Rows] --X--> T2[All Rows] --> Result[Combinations];

Caution: Results can quickly become massive; use carefully.

5. Natural Join

Definition: Joins tables automatically on identical column names.

NATURAL JOIN automatically joins tables using columns with identical names and compatible data types.

Real-World Scenerio: Imagine auto-matching two Excel sheets where both have a column named "Department ID." You don’t tell it how to match—it just finds the common column and links rows.

Example:

Tables with only deptno in common:

SELECT * FROM emp NATURAL JOIN dept;
Output (if matching deptno)
empno ename deptno dname loc
101 Alice 10 Sales LA
102 Bob 20 HR NY

Behavior Based on Common Columns

Common Columns Behavior
Exactly 1 Acts like INNER JOIN
None Acts like CROSS JOIN
More than 1 ❌ Error (Ambiguous)

Limitations & Warnings

Workflow

graph LR;
Table1[emp] -- auto-match on common columns --> Table2[dept];

✅ When to Use:

❌ When to Avoid:

Practical Tips & Insights

Aliasing is Clarity

Performance Optimization

Error Handling & Debugging

Natural Join: Use With Caution

💡 TL;DR: Be explicit. Be readable. Be fast. Test with care. Avoid magic. Use joins like a surgeon, not a gambler.

Common Pitfalls to Avoid

Missing Aliases in Self-Joins

Misuse of Natural Join

Overuse of Cross Joins

Ignoring Performance at Scale

💡 Pro Tip: Joins are powerful—but like any tool, misuse can cost you time, accuracy, and performance. Test smart, write clean.

📝 Test yourself

Test your understanding and practical grasp of joins with the following tasks. Create your own tables or use the emp and dept tables provided.

Objective: Apply each join type confidently in real-world-like scenarios.

🔹 Inner Join Task: List all employee names along with their department names. Expected Output: Only employees who have a valid department.

🔹 Left Outer Join Task: List all employees, along with department names if available. If not, show “Unassigned”. Hint: Use COALESCE() to replace NULL.

🔹 Right Outer Join Task: Show all departments with the names of employees working in them. Show “No Employee” if no one is assigned.

🔹 Full Outer Join Task: Display all employees and all departments—even if there is no match on either side.

🔹 Self Join Task: Show each employee with their manager's name. Hint: The manager is also in the emp table.

🔹 Cross Join Task: Create a pairing of every department with every employee. Hint: Be cautious of the number of rows generated.

🔹 Natural Join Task: Join emp and dept using a natural join. Then explain the output, and what column was automatically used for joining.

Bonus Challenge

Create your own real-world schema (like students, courses, enrollments) and write:

💡 Tip: Write queries, show sample outputs, and include 1 to 2 lines of explanation per query.