/* ISC 561 Two categories of syntax: DDL and DML We will cover DML first and then DDL SQL is a 4GL (how versus what) Basic structure of SQL SELECT (DML) SELECT attribute list FROM table/view/(SQL expression) list WHERE condition list GROUP BY attribute list HAVING condition list ORDER BY attribute list [ASC | DESC] The order of the statements is important. It must be in the above order. P1. The result of a SELECT query is a relation P2. A relation is a logical construct Ergo. In an SELECT query, anywhere you can put a constant, string literal, enumerated list, view, or table... you can put an SELECT query! We will briefly review the above syntax and then move on to subqueries. We will explore the use of uncorrelated and correlated subqueries in each clause (SELECT, FROM, WHERE...) Part of my objective this semester is to convince you that when building applications that interact with a database, before writing code, consider if a query will satisfy your requirements. All things being equal, a query will exhibit better performance have better fault isolation have lower development cost be more scalable, portable, and reusable be more secure than programming logic or a combination of programming logic and embedded SQL. */ /* In a practical sense, the SELECT clause is a "print" or "write" statement. SELECT "prints" a collection of rows and columns. Logically, however, this result can be treated as a relation, i.e., as if it were an actual table in the database. SELECT can print constants, the result of a calculation, the result of system functions, or data from tables. */ --For example: SELECT 'This is a string', 200, GETDATE(), 1+1, 'This is a string' + ' plus another string.' --And... SELECT 'This is a string' AS 'String', CAST(200 AS NVARCHAR) AS 'Number', GETDATE() AS 'SystemTime', 1+1 AS 'Calculation' --And so why not put a query in the SELECT clause? SELECT 'this is an uncorrelated subquery' AS OutputStr SELECT (SELECT 'this is an uncorrelated subquery' AS OutputStr) AS 'SubQueryColum', 'This is a string' AS 'String', CAST(200 AS NVARCHAR) AS 'Number', GETDATE() AS 'SystemTime', 1+1 AS 'Calculation' /* Subqueries in the SELECT clause A brief discussion about performance and a rule of thumb: If a join will serve, avoid subqueries. DavidMercerQueryPerformance.sql However, there are cases where a result cannot be obtained easily (or ever?) with an inner or outer join. *****UNCORRELATED***** --Generic syntax SELECT attribute list, (subquery1) AS Result1, (subquery2) AS Result2... FROM Table1, Table2, Tablen... WHERE Table1.Pk=Table2.Fk... AND condition list SELECT non-aggregate attribute list, (SELECT aggregate attribute FROM Table1, Table2 WHERE Table1.Pk=Table2.Fk AND conditionlist) AS AggregateResult1, (SELECT aggregate attribute FROM Table1, Table2 WHERE Table1.Pk=Table2.Fk AND conditionlist) AS AggregateResult2 FROM Table1, Table2, Tablen... WHERE Table1.Pk=Table2.Fk... AND condition list In the uncorrelated case, the inner query is executed once and then "printed" as an attribute value in every row of the outer query result. The inner query is completely independent of the outer query. Note that in query analyzer, we can highlight and execute the inner query independently of the outer query. This is not possible with the correlated subquery. */ USE Northwind --List the number of products for category 1 --First, write an outer query for the Categories table SELECT Categories.CategoryId, CategoryName FROM Categories WHERE CategoryID = 1 --Review contents of Products table SELECT ProductID, ProductName, CategoryID --* FROM Products ORDER BY CategoryID --Build a query to compute the count for Category 1 SELECT COUNT(*) AS ProductCountCategory1 FROM Products WHERE CategoryID = 1 --Place the COUNT query in the SELECT statement of the Categories query SELECT Categories.CategoryId, CategoryName, (SELECT COUNT(*) AS ProductCountCategory1 FROM Products WHERE CategoryID = 1) AS NumberOfProducts FROM Categories WHERE CategoryID = 1 --How does this compare to a JOIN/GROUP BY solution? SELECT Categories.CategoryID, CategoryName, COUNT(*) AS NumberOfProducts FROM Categories, Products WHERE Categories.CategoryID=Products.CategoryId AND Products.CategoryID = 1 GROUP BY Categories.CategoryID--, CategoryName SELECT CategoryName, Categories.CategoryID, COUNT(*) AS NumberOfProducts FROM Categories, Products WHERE Categories.CategoryID=Products.CategoryId --AND --Products.CategoryID = 1 GROUP BY Categories.CategoryID, CategoryName /* Use a subquery to include marginal totals. For order 10248, list the OrderID, CustomerID, OrderDate, Freight and the average freight for the entire database. */ --Build the outer query for order 10248 SELECT OrderID, CustomerID, OrderDate, Freight FROM Orders WHERE OrderID = 10248 --Compute the marginal total SELECT AVG(Freight) FROM Orders --Insert the marginal total as a subquery SELECT OrderID, CustomerID, OrderDate, Freight, (SELECT AVG(Freight) FROM Orders) AS AverageFreight FROM Orders WHERE OrderID = 10248 --If you don't qualify the WHERE clause, AverageFreight is --displayed for all orders or a given range of orders. /* *****CORRELATED***** SELECT non-aggregate attribute list, (SELECT aggregate attribute FROM Table2 WHERE Table1.Pk=Table2.Fk) AS AggregateResult1, (SELECT aggregate attribute FROM Table2 WHERE Table1.Pk=Table2.Fk) as AggregateResult2 FROM Table1, Table2, Tablen... WHERE Table1.Pk=Table2.Fk... AND condition list SELECT non-aggregate attribute list, (SELECT aggregate attribute FROM Table2 WHERE Table1.Pk=Table2.Fk) AS AggregateResult1, (SELECT aggregate attribute FROM Table2 WHERE Table1.Pk=Table2.Fk) AS AggregateResult2 FROM Table1 INNER JOIN Table2 ON Table1.Pk=Table2.Fk... AND condition list In the correlated case, the inner query is executed once for every row of the outer query result. The inner query depends on the current instance or value of the Pk in the outer query. Note that the FROM statement of the inner query does not include Table1 but still includes a Pk=Fk comparison. The Pk value is "passed in" as a parameter and used to produce the results for the current row of the outer query result. If we highlight the inner query in query analyzer and attempt to execute it independently, we get an error message. */ /* Let's say that marketing really liked our previous report (see below). */ SELECT Categories.CategoryId, CategoryName, (SELECT COUNT(*) AS ProductCountCategory1 FROM Products WHERE CategoryID = 1) AS NumberOfProducts FROM Categories WHERE CategoryID = 1 /* But they wanted to know if we can produce the report for each category and include the product count for each individual category. */ --This version is still uncorrelated SELECT Categories.CategoryId, CategoryName, (SELECT COUNT(*) AS ProductCountCategory1 FROM Products WHERE CategoryID = 1) AS NumberOfProducts FROM Categories --WHERE CategoryID = 1 /* When buiding correlated subqueries, I first build a stub for the subquery. */ SELECT Categories.CategoryId, CategoryName,('VALUE') AS Numberofproducts FROM Categories --The () part will be replaced with a correlated subquery. --Note the WHERE clause has been changed... SELECT Categories.CategoryId, CategoryName, (SELECT COUNT(*) AS ProductCountCategory1 FROM Products WHERE Products.CategoryID=Categories.CategoryID) AS NumberOfProducts FROM Categories --conceptually it is as if we are executing the outer query multiple times --As so... SELECT Categories.CategoryId, CategoryName, (SELECT COUNT(*) FROM Products WHERE Products.CategoryID=1) AS NumberOfProducts FROM Categories WHERE CategoryID = 1 SELECT Categories.CategoryId, CategoryName, (SELECT COUNT(*) FROM Products WHERE Products.CategoryID=2) AS NumberOfProducts FROM Categories WHERE CategoryID = 2 SELECT Categories.CategoryId, CategoryName, (SELECT COUNT(*) FROM Products WHERE Products.CategoryID=3) AS NumberOfProducts FROM Categories WHERE CategoryID = 3 --Which are UNIONed together SELECT Categories.CategoryId, CategoryName, (SELECT COUNT(*) FROM Products WHERE Products.CategoryID=1) AS NumberOfProducts FROM Categories WHERE CategoryID = 1 UNION SELECT Categories.CategoryId, CategoryName, (SELECT COUNT(*) FROM Products WHERE Products.CategoryID=2) AS NumberOfProducts FROM Categories WHERE CategoryID = 2 UNION SELECT Categories.CategoryId, CategoryName, (SELECT COUNT(*) FROM Products WHERE Products.CategoryID=3) AS NumberOfProducts FROM Categories WHERE CategoryID = 3 --Or as a JOIN/GROUP BY SELECT Categories.CategoryID, CategoryName, COUNT(*) AS NumberOfProducts FROM Categories, Products WHERE Categories.CategoryID=Products.CategoryId --AND Products.CategoryID = 1 GROUP BY Categories.CategoryID, CategoryName SELECT Categories.CategoryId, CategoryName, (SELECT COUNT(*) AS ProductCountCategory1 FROM Products WHERE Categories.CategoryID=Products.CategoryID) AS NumberOfProducts FROM Categories /* Using a correlated subquery in the SELECT to transpose (pivot or cross query) row data to columns. This technique works by including a row identifier in the condition list of the subquery. For example, in the query illustrated below, the row identifier is the Qtr {1, 2, 3, 4} Human Resources needs a report that lists all employees and the number of orders placed for each quarter in the year 1997. For example... Q1 Q2 Q3 Q4 1 Nancy Davolio 10 10 18 17 2 Andrew Fuller 9 10 11 11 3 Janet Leverling 19 16 10 26 4 Margaret Peacock 18 18 22 23 5 Steven Buchanan 3 4 6 5 6 Michael Suyama 6 8 5 14 7 Robert King 6 12 13 5 8 Laura Callahan 19 9 14 12 9 Anne Dodsworth 2 6 4 7 The row-wise data looks like this: EmployeeID Name Quarter NumberOfOrders 1 Nancy Davolio 1 10 1 Nancy Davolio 2 10 1 Nancy Davolio 3 18 1 Nancy Davolio 4 17 2 Andrew Fuller 1 9 2 Andrew Fuller 2 10 2 Andrew Fuller 3 11 2 Andrew Fuller 4 11 3 Janet Leverling 1 19 3 Janet Leverling 2 16 3 Janet Leverling 3 10 3 Janet Leverling 4 26 4 Margaret Peacock 1 18 4 Margaret Peacock 2 18 4 Margaret Peacock 3 22 4 Margaret Peacock 4 23 4 Steven Buchanan 1 3 5 Steven Buchanan 2 4 5 Steven Buchanan 3 6 5 Steven Buchanan 4 5 6 Michael Suyama 1 6 6 Michael Suyama 2 8 6 Michael Suyama 3 5 6 Michael Suyama 4 14 7 Robert King 1 6 7 Robert King 2 12 7 Robert King 3 13 7 Robert King 4 5 8 Laura Callahan 1 19 8 Laura Callahan 2 9 8 Laura Callahan 3 14 8 Laura Callahan 4 12 9 Anne Dodsworth 1 2 9 Anne Dodsworth 2 6 9 Anne Dodsworth 3 4 9 Anne Dodsworth 4 7 */ SELECT EmployeeID,DATEPART(QUARTER, OrderDate) AS Qtr, COUNT(*) AS OrderCount FROM Orders GROUP BY EmployeeID, DATEPART(QUARTER, OrderDate) ORDER BY EmployeeID, Qtr--DATEPART(QUARTER, OrderDate) /* SELECT Employees.EmployeeID, FirstName + ' ' + LastName AS Name, COUNT(Orders.OrderID) AS OrderCount FROM Orders, Employees WHERE Orders.EmployeeID=Employees.EmployeeID GROUP BY Employees.EmployeeID,Orders.OrderID, FirstName, LastName */ --Build the SELECT for non-aggregage attributes SELECT EmployeeID, FirstName + ' '+ LastName AS EmployeeName, ('Value') AS Qt1, ('Value') AS Qt2, ('Value') AS Qt3, ('Value') AS Qt4 FROM Employees /* Begin building the subquery: count orders by employee for the entire database */ SELECT COUNT(*) AS NumberOfOrders FROM Orders WHERE EmployeeID = 1 --Filter rows before grouping for year 1997 SELECT COUNT(*) AS NumberOfOrders FROM Orders WHERE EmployeeID = 1 AND DATEPART(YEAR, OrderDate) =1997 /* Find the first quarter data. This query produces the first subquery column in the outer query (Q1). Use DATENAME(datepart, date) */ SELECT COUNT(*) AS NumberOfOrders FROM Orders WHERE EmployeeID = 1 AND DATEPART(YEAR, OrderDate) =1997 AND DATEPART(quarter, OrderDate) = 1 SELECT * FROM ORDERS /* Find the Q1 result for employee 1. This query produces the value for cell (1,1) in the outer query. */ --This version is still uncorrelated. It will print 10 for all rows SELECT EmployeeID, FirstName + ' '+ LastName AS EmployeeName, (SELECT COUNT(*) AS NumberOfOrders FROM Orders WHERE EmployeeID = 1 AND DATEPART(YEAR, OrderDate) =1997 AND DATEPART(quarter, OrderDate) = 1) AS Qt1, ('Value') AS Qt2, ('Value') AS Qt3, ('Value') AS Qt4 FROM Employees --Replace the EmployeeID = 1 with a reference to the outer query. SELECT EmployeeID, FirstName + ' '+ LastName AS EmployeeName, (SELECT COUNT(*) AS NumberOfOrders FROM Orders WHERE EmployeeID = Employees.EmployeeID AND DATEPART(YEAR, OrderDate) =1997 AND DATEPART(quarter, OrderDate) = 1) AS Qt1, ('Value') AS Qt2, ('Value') AS Qt3, ('Value') AS Qt4 FROM Employees SELECT EmployeeID, FirstName + ' '+ LastName AS EmployeeName, (SELECT COUNT(*) AS NumberOfOrders FROM Orders WHERE EmployeeID = Employees.EmployeeID AND DATEPART(YEAR, OrderDate) =1997 AND DATEPART(quarter, OrderDate) = 1) AS Qt1, (SELECT COUNT(*) AS NumberOfOrders FROM Orders WHERE EmployeeID = Employees.EmployeeID AND DATEPART(YEAR, OrderDate) =1997 AND DATEPART(quarter, OrderDate) = 2) AS Qt2, (SELECT COUNT(*) AS NumberOfOrders FROM Orders WHERE EmployeeID = Employees.EmployeeID AND DATEPART(YEAR, OrderDate) =1997 AND DATEPART(quarter, OrderDate) = 3) AS Qt3, (SELECT COUNT(*) AS NumberOfOrders FROM Orders WHERE EmployeeID = Employees.EmployeeID AND DATEPART(YEAR, OrderDate) =1997 AND DATEPART(quarter, OrderDate) = 4) AS Qt4 FROM Employees /* A common application of this technique for me is transposing ROLES/PERMISSION tables into a single row user profile record. For example, tblUser ||---< tblProfile >---|| tblRole I create a single row profile that contains a bit value indicating whether the current user has that permission. For example, the row for user 1222 might be: UserID Admin Staff Faculty Student Visitor 1222 0 0 0 1 1 where this user has both the Student and Visitor roles. Having a single row simplifies presentation tier programming considerable because you can write statements such as: btnSaveAs.Visible = dtPersonProfile.Rows(0)("Admin") Where the button is turned on or off based on the bit value in the PersonProfile datatable. A good analogy in the Northwind Database would be: Employees ||---< EmployeeTerritories >---|| Territories If we wrote the query for the three territories Westboro (01581), Wilton (06897), and Neward (19713) */ --Create the outer query SELECT EmployeeID, Lastname, (0) AS WestBoro, (0) AS Wilton, (0) AS Neward FROM Employees WHERE EmployeeID = 1 /* Create the inner query for a single case Use the ISNULL() function to convert NULL values to 0 (false) If you run the inner query independently, you will see that it does not return a value because employee 1 does not have a row in the intersection table EmployeeTerritories. If the row exists, the SELECT prints a 1, if it is NULL, it prints a 0. */ SELECT ISNULL((SELECT 1 -- 1=true 0=false FROM EmployeeTerritories WHERE TerritoryID = 01581 AND EmployeeID = 1),0) AS WestBoro --Add subquery to outer query and make correlated SELECT EmployeeID, Lastname, (SELECT ISNULL((SELECT 1 -- 1=true 0=false FROM EmployeeTerritories WHERE TerritoryID = 01581 AND EmployeeTerritories.EmployeeID = Employees.EmployeeID),0)) AS WestBoro, (0) AS Wilton, (0) AS Neward FROM Employees WHERE EmployeeID = 1 --Now copy and paste for the other two... SELECT EmployeeID, Lastname, (SELECT ISNULL((SELECT 1 -- 1=true 0=false FROM EmployeeTerritories WHERE TerritoryID = 01581 AND EmployeeTerritories.EmployeeID = Employees.EmployeeID),0)) AS WestBoro, (SELECT ISNULL((SELECT 1 -- 1=true 0=false FROM EmployeeTerritories WHERE TerritoryID = 06897 AND EmployeeTerritories.EmployeeID = Employees.EmployeeID),0)) AS Bedford, (SELECT ISNULL((SELECT 1 -- 1=true 0=false FROM EmployeeTerritories WHERE TerritoryID = 19713 AND EmployeeTerritories.EmployeeID = Employees.EmployeeID),0)) AS Georetwo FROM Employees WHERE EmployeeID = 1 /* And so in this analogy, if territories were user roles, employee 1 would have two of the three roles and we could use this row to set role-based permissions. btnSaveAs.Visible = dtPersonProfile.Rows(0)("WestBoro") In the above case, the control btnSaveAs would be invisible. And of course, because it is correlated, we can exclude the WHERE clause and return the result for all employees. */ SELECT EmployeeID, Lastname, (SELECT ISNULL((SELECT 1 -- 1=true 0=false FROM EmployeeTerritories WHERE TerritoryID = 01581 AND EmployeeTerritories.EmployeeID = Employees.EmployeeID),0)) AS WestBoro, (SELECT ISNULL((SELECT 1 -- 1=true 0=false FROM EmployeeTerritories WHERE TerritoryID = 06897 AND EmployeeTerritories.EmployeeID = Employees.EmployeeID),0)) AS Bedford, (SELECT ISNULL((SELECT 1 -- 1=true 0=false FROM EmployeeTerritories WHERE TerritoryID = 19713 AND EmployeeTerritories.EmployeeID = Employees.EmployeeID),0)) AS Georetwo FROM Employees --WHERE EmployeeID = 1 ORDER BY EmployeeID SELECT TerritoryID, TerritoryDescription, RegionID FROM Territories WHERE TerritoryID IN(01581, 06897,19713) SELECT EmployeeID, TerritoryID FROM EmployeeTerritories ORDER BY EmployeeID /* SQL Server 2005 has a PIVOT operator, but it seems a bit disappointing as it is still necessary to "hard-code" values for the attribute you wish to transpose. The basic format is: SELECT Category attribute, list of values for transposed attribute FROM (SELECT...) AS TableToPivot PIVOT (COUNT(AttributeToCount) FOR TransposedAttribute IN (enumerated list of values for transposed attribute) AS PivotedTable sooo... */ SELECT EmployeeID, [01581] AS WestBoro, [06897] AS Bedford, [19713] AS Georetwo FROM (SELECT EmployeeID, TerritoryID FROM EmployeeTerritories WHERE EmployeeID=1) AS PivotTable PIVOT (COUNT(TerritoryID) FOR TerritoryID IN ([01581], [06897], [19713])) AS PVT ORDER BY EmployeeID SELECT EmployeeID, Lastname, (SELECT ISNULL((SELECT 1 -- 1=true 0=false FROM EmployeeTerritories WHERE TerritoryID = 01581 AND EmployeeTerritories.EmployeeID = Employees.EmployeeID),0)) AS WestBoro, (SELECT ISNULL((SELECT 1 -- 1=true 0=false FROM EmployeeTerritories WHERE TerritoryID = 06897 AND EmployeeTerritories.EmployeeID = Employees.EmployeeID),0)) AS Bedford, (SELECT ISNULL((SELECT 1 -- 1=true 0=false FROM EmployeeTerritories WHERE TerritoryID = 19713 AND EmployeeTerritories.EmployeeID = Employees.EmployeeID),0)) AS Georetwo FROM Employees WHERE EmployeeID = 1 ORDER BY EmployeeID