Categories > Database & SQL



An overall view of SQL Queries and Normalization in Database

Posted on November 8th, 2015 By Danial Javanmardi

All SQL queries in this article listed here.

From time to time, I forgot some of the database concepts and SQL syntax, although, I have worked with SQL for a long time and even taught it to other individuals or a class of students. My initial aim of this article was to prepare a comprehensive list of SQL to refer to them. Hence, you cannot consider this article as an SQL or Database tutorial. It is a quick, effective review that illustrates mostly through examples. Though, there are some of the concepts such as Normalization and Indexing are explained with more details. I described the following concepts:

Database Normalization

1st Normalization Form

No repeating value and no repeating group, for example, consider a product table contains [productID, productColor, productPrice]. In this case, if we have a record that store two colors at the same time in the productColor such as “red, green”. It breaks the first normalization rule, and to make it normalized, we need to create a separate table for color. In the end, we have color table contains [productID, productColor] and product table contains [productID, productPrice].

2nd Normalization Form

It must be 1st normalization form, and besides, all non-key attributes must fully depend on the key attribute. For instance, we have a purchase table contains [customerID, storeID, storeLocation], with the primary key (combination primary key) of [customerID, storeID]. In this case, storeLocation depends on storeID, this break 2nd normalization form. Consider somebody change a storeID to something else, and he does not change the storeLocation, this brings inconsistency in data. To make it 2nd Normalization form. We make a new table for location. Then, we have purchase table contains [customerID, storeID] and location table contains [storeID, storeLocation]. In this case, if somebody changes storeID, it does cause inconsistency in our data.

3rd Normalization Form

It must be 2nd normalization form, and also, all non-key attributes must not have any dependency on any other non-key attributes. Technically, this is 2nd normalization form that apply to the non-key attributes. For example, we have an order table contains [ID, productID, quantity, unitPrice, totalPrice], in this case, the totalPrice that results from quantity multiples unitPrice does not follow this rule. We could remove totalPrice from the table and calculate it on the run-time, such as a writing Store Procedure for it. In another example, we have competition table contains [ID, name, year, winnerName, winnerBirthDay]. You could see that winnerName and winnerBirthday are depends on each other. To resolve it, we create a new table called it winner contains [ID, name, birthday] and change the competition table to contain [ID, name, year, winnerID].

Others Concerns for Normalization

Indeed, there are a lot of time that performance concerns, normalization might degrade the performance or bring complexity in the development of an application. For example, employee table contains [name, address, phone1, phone2], this is not 1st normalization form, but we see, there are a lot of tables such as this in real world application. Another example, address table contains [ID, address, city, province, postalcode], this is not 3rd normalization form, as by having a postal code you could extract city and province. Though, it could be 3rd normalization form if there are a postalcode with two different provinces and two difference cities.

SQL – Data Manipulation Language (DML)

List all the students who their last name are James.

select * 
from student
where lastname = 'James';

 

List all the students who their last name start with James.

select * 
from student
where lastname LIKE 'James%';

Character % means whatever that might come after without any length limitation, while, character _ instead means only a character. Additionally, like is a very inefficient command in the database.

List all the students who marks are from 60 to 90.

select * 
from student
where marks Between 60 And 90;


List all the students that their last names are not entered.

select * 
from student
where lastname IS NULL;

 

List all the students who live in Toronto or Vancouver

select * 
from student 
where city ='toronto' or city ='vancouver'

Another way to write the above SQL command

select * 
from student 
where city IN ('toronto', 'vancouver');

 

List all the students, order them by their first names, and if first names are identical, then order them by last names.

select * 
from student 
order by firstname, lastname

 

List the students’ ID in the takes table with no duplication in the student ID. (there might be a student that takes multiple courses)

select Distinct(studentID)
from take

 

Aggregation functions

count(ID), sum(price), max(price), min(price), avg(price) are some of aggregation functions in the SQL.

List the total price of all products under category with ID 241.

select sum(price)
from product 
where productCategory = 241

 

List the average price of all products under category with ID 241.

select avg(price)
from product 
where productCategory = 241

 

The aggregation function only returns one single value, if you want to combine aggregation with other fields to return multiple values, you need to use group by.

List the average price of each category.

select avg(price), productCategory 
from product 
group by productCategory

 

List the average price of each category that the average price is higher than 100.

select avg(price) as avgPrice, productCategory 
from product 
group by productCategory
having avgPrice > 100

 

Use Join in Relational Database

There are different kinds of joins in SQL, by default join is set to be inner join by most DataBase Management System (DBMS). An inner join only returns the records that found a match both of the tables involving in the inner join. As an illustration, if we have a person and contact tables, and there is a person that does not have a record in contact table, the inner join would not return that person.

select * 
from person
Inner Join contact On person.personID = contact.personID 

 

The following figure illustrates all kinds of joins that could happen when joining two tables over each other.

All Kinds of Joins in SQL

Figure 1: All Kinds of Joins in SQL

List of students that take the course by the title of Database. (There are student, take and course tables.)

select * 
from (( student
Inner Join take On student.studentID = take.studentID )
Inner Join course On takes.courseID = course.courseID )
where course.courseTitle = ‘Database’

Or could be written without brackets as follow

select * 
from student
Inner Join take On student.studentID = take.studentID
Inner Join course On takes.courseID = course.courseID 
where course.courseTitle = ‘Database’

 

Insert, Update, and Delete

To create a new record in the student table:

Insert into student
(studentFirstName, studentLastName, studentAge)
values ('Danial','Javanmardi2', 21);

 

To update a record in the student table:

Update student
set studentEmail = 'info@mandanemedia.com', studentLastName = 'Javanmardi2'
where studentID = 11

 

To delete a record in the student table:

Delete from student
where studentID = 11

 

SQL – Data Definition Language (DDL)

Create student table

Create student
(studentID integer Primary Key,
studentFirstName varchar(30) NOT NULL,
studentLasttName varchar(30) NOT NULL,
age integer);

 

Change the attribute of student table

Alert table student
Add email varchar(150);


Add relation between the student and take. In this situation, we need to set the foreign key in the take table. Take table is between course and student table, it list studentID and courseID. Each student can take multiple courses.

Alert Table take
Add Constraint `FK_myKeyTitle` 
Foreign Key (studentID) 
References student (studentID) 
On Delete No Action On Update Cascade;


Set character set to utf8, collation to utf8_unicode_ci, and Engine to InnoDB of a table after you created the table.

ALTER TABLE `student`
ENGINE=InnoDB,
DEFAULT CHARACTER SET=utf8 COLLATE=utf8_unicode_ci;

 

to delete a student table

Drop table student;

 

To delete all records in the student table

Truncate table student;

SQL – Data Control Language (DCL)

Giving permission to list records in the student table

Grant select 
On universityDatabase.student 
To danial

 

Getting back the permission was give to list records in the student table

Revoke select 
On universityDatabase.student 
from danial

 

Explicitly prevent a user from receiving special permission

Deny delete
On universityDatabase.student 
To webBrowser

Indexing in Database

There has been a time you faced with a large table, in this case, only executing a select query; it might take 20sec. Then, indexing plays a critical role for you. However, you must consider indexing in the early stage of your database design.

The concept is very simple; I remember my first programming language book that was around 1100 pages (It was C Programming Language :D ). There are a lot of indexing in each book, for example, there is keyword indexing at the end of a programming language book that tell you which page has mentioned a particular keyword such as class.

There are two kinds of indexing that you could define on a table. They are clustered and non-clustered indexing. You can only have one clustered indexing in a table while you can have multiple non-clustered indexing on a table. By default, DBMS set the primary key to be the clustered. For instance, InnoDB Storage Engine considers the primary key of each table as clustered index in MySQL. In most of DBMS, you have the option to explicitly specify a different field of your table to be the clustered indexing.

Clustered indexing means how physically your data are stored in your table. Going back to the concept of a book to explain clustered and non-clustered indexing. The clustered index is the Page Number in a book, as the physical order or a book is the page number. Non-clustered indexing is a list of figures, chapters, or keywords at the end of the book.

In an illustration, consider student table at a university with more than 1,000,000 records, as the university has operated since 1970. For simplicity, we use the student table that is defined in DDL.

Create student table

Create student
(studentID integer Primary Key,
studentFirstName varchar(30) NOT NULL,
studentLasttName varchar(30) NOT NULL,
age integer);

 

Create clustered indexing on studentD

Create Clustered Index myfirstIndexing_name
On student (studentID ASC)

 

Create Non-clustered indexing on studentLastName

Create Nonclustered Index mySecondIndexing_name
On student (studentLasttName  ASC)

 

On this way, if we list all the students with a specific last name on the where condition, the students list instantly appear to us. There are pros and cons to consider a field for indexing. It is true that make the execution of select(Data Retrieval) statement very fast, while when we make an insert into the table, it takes longer time. As the DBMS needs to create indexing for this new record.

Store Procedure

We have a chunk of SQL queries that we call them frequently. We do not want to spend a lot of time each time to rewrite them again, we store them in a place and give it a name. Then quickly call it by the name, and it does the job for us. Store procedure in SQL plays the same role as Function or Method in programming languages such as C++, Java, etc.

Additionally, Store Procedure is very handily most of the SQL statement that we have seen until now in this article were simple, while, in real worlds, you need to deal with much more complex SQL queries.

Create a Store Procedure to list the average price of each category in the product.

Create Procedure avgPriceForEachProductCategory
select avg(price), productCategory 
from product 
group by productCategory;
End;

 

Example from the 3rd form of normalization in this article.

Create a Store Procedure to list the total price as well as other attributes in the order table [ID, productID, quantity, unitPrice].

Create Procedure orderWithTotalPrice
select ID, productID, quantity, unitPrice, quantity*unitPrice as totalPrice 
from order;
End;

 

Create a Store Procedure to list the students, their course, and the taken semester who got mark less than giveX for a particular course name.

Create Procedure studentwithMarkLessThanX
@givenX integer,
@sampleCourseName  varchar(30),
As
Begin
select studentID, studentFirstName, studentLastName, courseName, takenSemester
from student
inner join take on student.studentID = take.studentID
inner join course on takes.courseID = course.courseID
where take.mark > @sampleX and course.courseName = @sampleCourseName;
End;

 

The last simple step is how to call a Store Procedure:

call orderWithTotalPrice

 

Note: the syntax of store procedure in different DBMS might be slightly different.

Besides, regarding security, all of the stored procedure save in the database, this gives us advantages. Banks usually limited the user(or the developer of a specific application ) to call Store Procedure and do not allow them to write any other SQL command out of the Store Procedure. In this situation, the confidentiality of data remains much more secure, as an example it prevents any SQL injection attack on the spot :D

Nested SQL queries

There are times that you need multiple nested queries to get the list that you want.

List all employees who their salaries are bigger than the Jack’s salary.

select employeeName
from employee e1
where salary >
     ( select salary
from employee e2
       where employeeName = 'Jack' )

Note: the field in the second select statement is the important part. It must exactly match with the same field in the where statement of the first select statement. Besides, in this example, I consider there is only one employee with the name of Jack in the company. Otherwise, we need to use All in the where statement of the first select statement. I illustrated All in the following example.

 

List all of the employees who their salaries are bigger than the salary of Jack, Steven, and Alex.

To simplify it, if we know the salary of each of them, we can write it as below.

select employeeName
from employee
where salary > All (6500, 60000, 75000);

And, how about if we do not aware their salaries. We need to revise the SQL commands to be a nested ones as below:

select e1.employeeName
from employee e1
where salary > All 
     ( select salary
from employee e2
       where e2.employeeName='Jack' or e2.employeeName='Steven' or e2.employeeName='Alex')

 

Increase the salaries of all of the employees by 20% who have worked in the company since 2012.

In this case, we need to break the question into smaller chunks, so list all the employees who have worked in the company since 2012. It is a simple select query as below.

select *
from employee
where employmentYear > 2011

Also, we need to write query to update the salary of employee by 20%

update employee
set salary = salary*0.25

Finally, we combine these two queries into a nested query as below:

update employee
set salary = salary * 0.25
where employeeID in 
( select employeeID 
  from employee e2
  where employmentYear > 2011 )

 

Copy all of the records from the employee to employee_backup table.

We have an employee table, and we create a new empty, identical table called employee_backup( with same structure).

insert into employee_backup
select * from employee
where employeeID in 
 		( select employeeID 
  from employee 
  where employeeType = "contract");

Note: since employee and emloyee_backup have exactly identical structure, we no need to list the attribute name and their order in the insert statement and respectively in the select statement.

Difference between Any and All

The difference between Any(Some) and All in the nested query illustrated by example as below.

select employeeName
from employee
where salary > All (6500, 60000, 75000);

This equals to below statement:

select employeeName
from employee
where salary > 6500 and salary > 60000 and salary > 75000;

 

While for any instead of and we use or, as below.

select employeeName
from employee
where salary > Any (6500, 60000, 75000);

This equals to below statement:

select employeeName
from employee
where salary > 6500 or salary > 60000 or salary > 75000;

Quick link to SQL queries in this article:

  1. Simple Select
  2. Like
  3. IS NULL
  4. OR & IN
  5. Order By
  6. Distinct
  7. Sum & Avg
  8. Group By
  9. Group By & Having
  10. Inner Join
  11. Multiple Inner Join
  12. Insert
  13. Update
  14. Delete
  15. Create Table
  16. Alert Table
  17. Drop Table
  18. Truncate Table
  19. Grant
  20. Revoke
  21. Deny
  22. Create Clustered Index
  23. Create Nonclustered Index
  24. Create Procedure
  25. Create Procedure with Variable
  26. Nested Queries same table
  27. Nested Queries with All
  28. Nested Queries and Update
  29. Nested Queries and Insert
  30. All(AND)
  31. Any(OR)