What is normalization and why is it important?
Answer
Database Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It involves decomposing tables into smaller, related tables and defining relationships between them.
Why Normalization is Important
Benefits:
- Eliminates Data Redundancy - Reduces duplicate data storage
- Improves Data Integrity - Prevents inconsistent data
- Reduces Storage Space - More efficient use of disk space
- Easier Maintenance - Updates need to be made in fewer places
- Prevents Anomalies - Avoids insertion, update, and deletion anomalies
Normal Forms
First Normal Form (1NF)
Rules:
- Each column contains atomic (indivisible) values
- Each column contains values of the same type
- Each column has a unique name
- Order of rows and columns doesn’t matter
-- Violates 1NF (multiple values in one column)
CREATE TABLE students_bad (
student_id INT,
name VARCHAR(100),
subjects VARCHAR(200) -- "Math, Science, English"
);
-- Follows 1NF
CREATE TABLE students (
student_id INT,
name VARCHAR(100)
);
CREATE TABLE student_subjects (
student_id INT,
subject VARCHAR(50)
);
Second Normal Form (2NF)
Rules:
- Must be in 1NF
- All non-key attributes must be fully functionally dependent on the primary key
- Eliminates partial dependencies
-- Violates 2NF (partial dependency)
CREATE TABLE order_details_bad (
order_id INT,
product_id INT,
product_name VARCHAR(100), -- Depends only on product_id
quantity INT,
PRIMARY KEY (order_id, product_id)
);
-- Follows 2NF
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100)
);
CREATE TABLE order_details (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
Third Normal Form (3NF)
Rules:
- Must be in 2NF
- No transitive dependencies (non-key attributes should not depend on other non-key attributes)
-- Violates 3NF (transitive dependency)
CREATE TABLE employees_bad (
emp_id INT PRIMARY KEY,
name VARCHAR(100),
dept_id INT,
dept_name VARCHAR(100), -- Depends on dept_id, not emp_id
salary DECIMAL(10,2)
);
-- Follows 3NF
CREATE TABLE departments (
dept_id INT PRIMARY KEY,
dept_name VARCHAR(100)
);
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
name VARCHAR(100),
dept_id INT,
salary DECIMAL(10,2),
FOREIGN KEY (dept_id) REFERENCES departments(dept_id)
);
Boyce-Codd Normal Form (BCNF)
Rules:
- Must be in 3NF
- Every determinant must be a candidate key
- Stricter version of 3NF
-- Example: Student-Subject-Professor relationship
-- Assumption: Each subject is taught by only one professor
-- Violates BCNF
CREATE TABLE teaching_bad (
student_id INT,
subject VARCHAR(50),
professor VARCHAR(100),
PRIMARY KEY (student_id, subject)
-- Professor determines subject, but professor is not a candidate key
);
-- Follows BCNF
CREATE TABLE professors (
professor_id INT PRIMARY KEY,
professor_name VARCHAR(100)
);
CREATE TABLE subjects (
subject_id INT PRIMARY KEY,
subject_name VARCHAR(50),
professor_id INT,
FOREIGN KEY (professor_id) REFERENCES professors(professor_id)
);
CREATE TABLE enrollments (
student_id INT,
subject_id INT,
PRIMARY KEY (student_id, subject_id),
FOREIGN KEY (subject_id) REFERENCES subjects(subject_id)
);
Practical Example: E-commerce Database
Before Normalization (Unnormalized):
CREATE TABLE orders_unnormalized (
order_id INT,
customer_name VARCHAR(100),
customer_email VARCHAR(100),
customer_address VARCHAR(200),
product_names VARCHAR(500), -- "Laptop, Mouse, Keyboard"
product_prices VARCHAR(100), -- "999.99, 25.99, 79.99"
quantities VARCHAR(50), -- "1, 2, 1"
order_date DATE
);
After Normalization (3NF):
-- Customers table
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
email VARCHAR(100),
address VARCHAR(200)
);
-- Products table
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
price DECIMAL(10,2)
);
-- Orders table
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
-- Order items table
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
Data Anomalies Prevention
Insertion Anomaly:
- Problem: Cannot add a new department without hiring an employee
- Solution: Separate departments and employees tables
Update Anomaly:
- Problem: Changing department name requires updating multiple rows
- Solution: Store department name only in departments table
Deletion Anomaly:
- Problem: Deleting last employee removes department information
- Solution: Keep department data separate from employee data
When NOT to Normalize
Sometimes denormalization is acceptable for:
- Performance reasons - Fewer JOINs in read-heavy applications
- Data warehousing - Analytical queries benefit from denormalized data
- Reporting systems - Simplified queries for reports
Interview Tips
- Know the normal forms: Understand 1NF, 2NF, 3NF, and BCNF
- Identify violations: Be able to spot normalization violations
- Understand trade-offs: Know when normalization might hurt performance
- Practice examples: Work through normalization exercises
- Explain benefits: Articulate why normalization is important
Common Interview Questions
- “What problems does normalization solve?”
- “Can you normalize this table structure?”
- “When might you choose not to normalize?”
- “What’s the difference between 3NF and BCNF?”
Test Your Knowledge
Take a quick quiz to test your understanding of this topic.