- An overall view of SQL Queries and Normalization in Database
- How simple can create a Data-Driven Application by using ASP.NET MVC?
- What are the challenging characteristics to establish a Virtual Team?
- How to design an object-oriented game? (Case Study: Save My Water – available on iTunes)
- A stupid Robot appears without Artificial Neural Network!
Categories > Database & SQL
An overall view of SQL Queries and Normalization in Database
Posted on November 8th, 2015 By Danial Javanmardi
All SQL queries in this article listed here.
From time to time, I forgot some of the database concepts and SQL syntax, although, I have worked with SQL for a long time and even taught it to other individuals or a class of students. My initial aim of this article was to prepare a comprehensive list of SQL to refer to them. Hence, you cannot consider this article as an SQL or Database tutorial. It is a quick, effective review that illustrates mostly through examples. Though, there are some of the concepts such as Normalization and Indexing are explained with more details. I described the following concepts:
- Database Normalization
- SQL – Data Manipulation Language (DML)
- SQL – Data Definition Language (DDL)
- SQL – Data Control Language (DCL)
- Indexing in Database
- Store Procedure
- Nested SQL queries
Database Normalization
1st Normalization Form
No repeating value and no repeating group, for example, consider a product table contains [productID, productColor, productPrice]. In this case, if we have a record that store two colors at the same time in the productColor such as “red, green”. It breaks the first normalization rule, and to make it normalized, we need to create a separate table for color. In the end, we have color table contains [productID, productColor] and product table contains [productID, productPrice].
2nd Normalization Form
It must be 1st normalization form, and besides, all non-key attributes must fully depend on the key attribute. For instance, we have a purchase table contains [customerID, storeID, storeLocation], with the primary key (combination primary key) of [customerID, storeID]. In this case, storeLocation depends on storeID, this break 2nd normalization form. Consider somebody change a storeID to something else, and he does not change the storeLocation, this brings inconsistency in data. To make it 2nd Normalization form. We make a new table for location. Then, we have purchase table contains [customerID, storeID] and location table contains [storeID, storeLocation]. In this case, if somebody changes storeID, it does cause inconsistency in our data.
3rd Normalization Form
It must be 2nd normalization form, and also, all non-key attributes must not have any dependency on any other non-key attributes. Technically, this is 2nd normalization form that apply to the non-key attributes. For example, we have an order table contains [ID, productID, quantity, unitPrice, totalPrice], in this case, the totalPrice that results from quantity multiples unitPrice does not follow this rule. We could remove totalPrice from the table and calculate it on the run-time, such as a writing Store Procedure for it. In another example, we have competition table contains [ID, name, year, winnerName, winnerBirthDay]. You could see that winnerName and winnerBirthday are depends on each other. To resolve it, we create a new table called it winner contains [ID, name, birthday] and change the competition table to contain [ID, name, year, winnerID].
Others Concerns for Normalization
Indeed, there are a lot of time that performance concerns, normalization might degrade the performance or bring complexity in the development of an application. For example, employee table contains [name, address, phone1, phone2], this is not 1st normalization form, but we see, there are a lot of tables such as this in real world application. Another example, address table contains [ID, address, city, province, postalcode], this is not 3rd normalization form, as by having a postal code you could extract city and province. Though, it could be 3rd normalization form if there are a postalcode with two different provinces and two difference cities.
SQL – Data Manipulation Language (DML)
List all the students who their last name are James.
select * from student where lastname = 'James';
List all the students who their last name start with James.
select * from student where lastname LIKE 'James%';
Character % means whatever that might come after without any length limitation, while, character _ instead means only a character. Additionally, like is a very inefficient command in the database.
List all the students who marks are from 60 to 90.
select * from student where marks Between 60 And 90;
List all the students that their last names are not entered.
select * from student where lastname IS NULL;
List all the students who live in Toronto or Vancouver
select * from student where city ='toronto' or city ='vancouver'
Another way to write the above SQL command
select * from student where city IN ('toronto', 'vancouver');
List all the students, order them by their first names, and if first names are identical, then order them by last names.
select * from student order by firstname, lastname
List the students’ ID in the takes table with no duplication in the student ID. (there might be a student that takes multiple courses)
select Distinct(studentID) from take
Aggregation functions
count(ID), sum(price), max(price), min(price), avg(price) are some of aggregation functions in the SQL.
List the total price of all products under category with ID 241.
select sum(price) from product where productCategory = 241
List the average price of all products under category with ID 241.
select avg(price) from product where productCategory = 241
The aggregation function only returns one single value, if you want to combine aggregation with other fields to return multiple values, you need to use group by.
List the average price of each category.
select avg(price), productCategory from product group by productCategory
List the average price of each category that the average price is higher than 100.
select avg(price) as avgPrice, productCategory from product group by productCategory having avgPrice > 100
Use Join in Relational Database
There are different kinds of joins in SQL, by default join is set to be inner join by most DataBase Management System (DBMS). An inner join only returns the records that found a match both of the tables involving in the inner join. As an illustration, if we have a person and contact tables, and there is a person that does not have a record in contact table, the inner join would not return that person.
select * from person Inner Join contact On person.personID = contact.personID
The following figure illustrates all kinds of joins that could happen when joining two tables over each other.
List of students that take the course by the title of Database. (There are student, take and course tables.)
select * from (( student Inner Join take On student.studentID = take.studentID ) Inner Join course On takes.courseID = course.courseID ) where course.courseTitle = ‘Database’
Or could be written without brackets as follow
select * from student Inner Join take On student.studentID = take.studentID Inner Join course On takes.courseID = course.courseID where course.courseTitle = ‘Database’
Insert, Update, and Delete
To create a new record in the student table:
Insert into student (studentFirstName, studentLastName, studentAge) values ('Danial','Javanmardi2', 21);
To update a record in the student table:
Update student set studentEmail = 'info@mandanemedia.com', studentLastName = 'Javanmardi2' where studentID = 11
To delete a record in the student table:
Delete from student where studentID = 11
SQL – Data Definition Language (DDL)
Create student table
Create student (studentID integer Primary Key, studentFirstName varchar(30) NOT NULL, studentLasttName varchar(30) NOT NULL, age integer);
Change the attribute of student table
Alert table student Add email varchar(150);
Add relation between the student and take. In this situation, we need to set the foreign key in the take table. Take table is between course and student table, it list studentID and courseID. Each student can take multiple courses.
Alert Table take Add Constraint `FK_myKeyTitle` Foreign Key (studentID) References student (studentID) On Delete No Action On Update Cascade;
Set character set to utf8, collation to utf8_unicode_ci, and Engine to InnoDB of a table after you created the table.
ALTER TABLE `student` ENGINE=InnoDB, DEFAULT CHARACTER SET=utf8 COLLATE=utf8_unicode_ci;
to delete a student table
Drop table student;
To delete all records in the student table
Truncate table student;
SQL – Data Control Language (DCL)
Giving permission to list records in the student table
Grant select On universityDatabase.student To danial
Getting back the permission was give to list records in the student table
Revoke select On universityDatabase.student from danial
Explicitly prevent a user from receiving special permission
Deny delete On universityDatabase.student To webBrowser
Indexing in Database
There has been a time you faced with a large table, in this case, only executing a select query; it might take 20sec. Then, indexing plays a critical role for you. However, you must consider indexing in the early stage of your database design.
The concept is very simple; I remember my first programming language book that was around 1100 pages (It was C Programming Language ). There are a lot of indexing in each book, for example, there is keyword indexing at the end of a programming language book that tell you which page has mentioned a particular keyword such as class.
There are two kinds of indexing that you could define on a table. They are clustered and non-clustered indexing. You can only have one clustered indexing in a table while you can have multiple non-clustered indexing on a table. By default, DBMS set the primary key to be the clustered. For instance, InnoDB Storage Engine considers the primary key of each table as clustered index in MySQL. In most of DBMS, you have the option to explicitly specify a different field of your table to be the clustered indexing.
Clustered indexing means how physically your data are stored in your table. Going back to the concept of a book to explain clustered and non-clustered indexing. The clustered index is the Page Number in a book, as the physical order or a book is the page number. Non-clustered indexing is a list of figures, chapters, or keywords at the end of the book.
In an illustration, consider student table at a university with more than 1,000,000 records, as the university has operated since 1970. For simplicity, we use the student table that is defined in DDL.
Create student table
Create student (studentID integer Primary Key, studentFirstName varchar(30) NOT NULL, studentLasttName varchar(30) NOT NULL, age integer);
Create clustered indexing on studentD
Create Clustered Index myfirstIndexing_name On student (studentID ASC)
Create Non-clustered indexing on studentLastName
Create Nonclustered Index mySecondIndexing_name On student (studentLasttName ASC)
On this way, if we list all the students with a specific last name on the where condition, the students list instantly appear to us. There are pros and cons to consider a field for indexing. It is true that make the execution of select(Data Retrieval) statement very fast, while when we make an insert into the table, it takes longer time. As the DBMS needs to create indexing for this new record.
Store Procedure
We have a chunk of SQL queries that we call them frequently. We do not want to spend a lot of time each time to rewrite them again, we store them in a place and give it a name. Then quickly call it by the name, and it does the job for us. Store procedure in SQL plays the same role as Function or Method in programming languages such as C++, Java, etc.
Additionally, Store Procedure is very handily most of the SQL statement that we have seen until now in this article were simple, while, in real worlds, you need to deal with much more complex SQL queries.
Create a Store Procedure to list the average price of each category in the product.
Create Procedure avgPriceForEachProductCategory select avg(price), productCategory from product group by productCategory; End;
Example from the 3rd form of normalization in this article.
Create a Store Procedure to list the total price as well as other attributes in the order table [ID, productID, quantity, unitPrice].
Create Procedure orderWithTotalPrice select ID, productID, quantity, unitPrice, quantity*unitPrice as totalPrice from order; End;
Create a Store Procedure to list the students, their course, and the taken semester who got mark less than giveX for a particular course name.
Create Procedure studentwithMarkLessThanX @givenX integer, @sampleCourseName varchar(30), As Begin select studentID, studentFirstName, studentLastName, courseName, takenSemester from student inner join take on student.studentID = take.studentID inner join course on takes.courseID = course.courseID where take.mark > @sampleX and course.courseName = @sampleCourseName; End;
The last simple step is how to call a Store Procedure:
call orderWithTotalPrice
Note: the syntax of store procedure in different DBMS might be slightly different.
Besides, regarding security, all of the stored procedure save in the database, this gives us advantages. Banks usually limited the user(or the developer of a specific application ) to call Store Procedure and do not allow them to write any other SQL command out of the Store Procedure. In this situation, the confidentiality of data remains much more secure, as an example it prevents any SQL injection attack on the spot
Nested SQL queries
There are times that you need multiple nested queries to get the list that you want.
List all employees who their salaries are bigger than the Jack’s salary.
select employeeName from employee e1 where salary > ( select salary from employee e2 where employeeName = 'Jack' )
Note: the field in the second select statement is the important part. It must exactly match with the same field in the where statement of the first select statement. Besides, in this example, I consider there is only one employee with the name of Jack in the company. Otherwise, we need to use All in the where statement of the first select statement. I illustrated All in the following example.
List all of the employees who their salaries are bigger than the salary of Jack, Steven, and Alex.
To simplify it, if we know the salary of each of them, we can write it as below.
select employeeName from employee where salary > All (6500, 60000, 75000);
And, how about if we do not aware their salaries. We need to revise the SQL commands to be a nested ones as below:
select e1.employeeName from employee e1 where salary > All ( select salary from employee e2 where e2.employeeName='Jack' or e2.employeeName='Steven' or e2.employeeName='Alex')
Increase the salaries of all of the employees by 20% who have worked in the company since 2012.
In this case, we need to break the question into smaller chunks, so list all the employees who have worked in the company since 2012. It is a simple select query as below.
select * from employee where employmentYear > 2011
Also, we need to write query to update the salary of employee by 20%
update employee set salary = salary*0.25
Finally, we combine these two queries into a nested query as below:
update employee set salary = salary * 0.25 where employeeID in ( select employeeID from employee e2 where employmentYear > 2011 )
Copy all of the records from the employee to employee_backup table.
We have an employee table, and we create a new empty, identical table called employee_backup( with same structure).
insert into employee_backup select * from employee where employeeID in ( select employeeID from employee where employeeType = "contract");
Note: since employee and emloyee_backup have exactly identical structure, we no need to list the attribute name and their order in the insert statement and respectively in the select statement.
Difference between Any and All
The difference between Any(Some) and All in the nested query illustrated by example as below.
select employeeName from employee where salary > All (6500, 60000, 75000);
This equals to below statement:
select employeeName from employee where salary > 6500 and salary > 60000 and salary > 75000;
While for any instead of and we use or, as below.
select employeeName from employee where salary > Any (6500, 60000, 75000);
This equals to below statement:
select employeeName from employee where salary > 6500 or salary > 60000 or salary > 75000;
Quick link to SQL queries in this article:
- Simple Select
- Like
- IS NULL
- OR & IN
- Order By
- Distinct
- Sum & Avg
- Group By
- Group By & Having
- Inner Join
- Multiple Inner Join
- Insert
- Update
- Delete
- Create Table
- Alert Table
- Drop Table
- Truncate Table
- Grant
- Revoke
- Deny
- Create Clustered Index
- Create Nonclustered Index
- Create Procedure
- Create Procedure with Variable
- Nested Queries same table
- Nested Queries with All
- Nested Queries and Update
- Nested Queries and Insert
- All(AND)
- Any(OR)