/* ISC 561 Subqueries in the FROM clause Uncorrelated SELECT outer query attribute list, subquery attribute list FROM table1, table2..., (SELECT attribute1, aggregate FROM table3 WHERE condition list GROUP BY attribute1) AS InnerQuery WHERE table1.Pk=table2.Fk AND table1.Pk=InnerQuery.Fk In this solution, the GROUP BY results are computed as a separate unccorelated subquery and then joined to the outer tables. This solution is essentially the same thing as defining InnerQuery as either a view or materialized view. InnerQuery is a temporary or ad hoc view. If we want InnerQuery to persist, we would store it as a view. This is essentially an instance of reuse. We are reusing a subquery and treating it as if it were an actual relation (which we can because the result of a query is a relation!) To my knowledge, it will not work to do a correlated subquery in the FROM clause. */ /* Let's revisit the Categories/Products query from the SELECT section. List the number of products for category 1 Rewrite this correlated subquery as uncorrelated in the FROM clause */ USE Northwind SELECT Categories.CategoryId, CategoryName, (SELECT COUNT(*) FROM Products WHERE Categories.CategoryID=Products.CategoryID) AS NumberOfProducts FROM Categories /* In the correlated version given above, for every row in the Categories table, we computed a COUNT of the products for that category. The COUNT was computed n times where n is the number of rows in Categories. Another approach is to first compute the COUNTs as a relation and then JOIN it to the Categories table. */ USE Northwind --Compute the COUNT of products for every category SELECT Categories.CategoryId, CategoryName, (SELECT COUNT(*) FROM Products WHERE Categories.CategoryID=Products.CategoryID) AS NumberOfProducts FROM Categories SELECT Products.CategoryID, COUNT(*) As NumberOfProducts FROM Products GROUP BY Products.CategoryID /* When I construct subqueries in the FROM clause, I usually verify and test the innerquery first and then begin to construct the outer query "around" it. Step 1: construct inner queries Step 2: insert inner queries in an outer query (with stubs) Step 3: incrementally add attributes to the outer query */ SELECT InnerQuery.CategoryID, InnerQuery.NumberOfProducts FROM (SELECT Products.CategoryID, COUNT(*) As NumberOfProducts FROM Products GROUP BY Products.CategoryID) AS InnerQuery /* In the above, I am treating the inner subquery as if it were an actual relation. I use "AS" (alias) to give the relation a temporary name. Now, I can continue on with building my result set... I will JOIN InnerQuery with the Categories table */ SELECT Categories.*, InnerQuery.CategoryID, InnerQuery.NumberOfProducts FROM (SELECT Products.CategoryID, COUNT(*) As NumberOfProducts FROM Products GROUP BY Products.CategoryID) AS InnerQuery, Categories WHERE Categories.CategoryID=InnerQuery.CategoryID --Now I include only those attributes required in the output SELECT Categories.CategoryID, Categories.CategoryName, --Categories.* InnerQuery.NumberOfProducts FROM (SELECT Products.CategoryID, COUNT(*) As NumberOfProducts FROM Products GROUP BY Products.CategoryID) AS InnerQuery, Categories WHERE Categories.CategoryID=InnerQuery.CategoryID /* If we find that we want to reuse "InnerQuery", it would be best to store "InnerQuery" as a view and give it a more meaningful name. Well, actually, it should have a meaningful name now, but I'm using InnerQuery to emphasize the point. */ --DROP VIEW vwProductCountByCategory CREATE VIEW vwProductCountByCategory AS SELECT Products.CategoryID, COUNT(*) As NumberOfProducts FROM Products GROUP BY Products.CategoryID SELECT * FROM vwProductCountByCategory SELECT Categories.CategoryId, CategoryName, vwProductCountByCategory.NumberOfProducts FROM Categories, vwProductCountByCategory WHERE Categories.CategoryID=vwProductCountByCategory.CategoryID /* How do the four solutions compare on performance? */ --Correlated in SELECT SELECT Categories.CategoryId, CategoryName, (SELECT COUNT(*) FROM Products WHERE Categories.CategoryID=Products.CategoryID) AS NumberOfProducts FROM Categories --Uncorrelated in FROM SELECT Categories.CategoryID, Categories.CategoryName, InnerQuery.NumberOfProducts FROM (SELECT Products.CategoryID, COUNT(*) As NumberOfProducts FROM Products GROUP BY Products.CategoryID) AS InnerQuery, Categories WHERE Categories.CategoryID=InnerQuery.CategoryID --Uncorrelated in FROM with view SELECT Categories.CategoryId, CategoryName, vwProductCountByCategory.NumberOfProducts FROM Categories, vwProductCountByCategory WHERE Categories.CategoryID=vwProductCountByCategory.CategoryID --GROUP BY putting all CategoryName in the GROUP BY clause --CategoryName is the only non-aggregate attribute in this output SELECT Categories.CategoryID, CategoryName, COUNT(*) AS NumberOfProducts FROM Categories, Products WHERE Categories.CategoryID=Products.CategoryId GROUP BY Categories.CategoryID, CategoryName /* Display the lastname and number of orders filled by each employee for year 1998. Uncorrelated GROUP BY */ --Create and test the inner subquery --Have an exploratory look at the table first... SELECT TOP 500 EmployeeID, OrderID, CustomerID, OrderDate --... other attributes Orders.* FROM Orders ORDER BY EmployeeID --Create counts by employees SELECT EmployeeID, COUNT(*) as NumberOfOrders FROM Orders GROUP BY EmployeeID --Only count rows where order was placed in 1998 SELECT EmployeeID, COUNT(*) as NumberOfOrders FROM Orders WHERE DATEPART(YEAR, OrderDate) = 1998-- YEAR(OrderDate) = 1998 GROUP BY EmployeeID --Create the outer query SELECT InnerQuery.EmployeeID, InnerQuery.NumberOfOrders FROM (SELECT EmployeeID, COUNT(*) as NumberOfOrders FROM Orders WHERE YEAR(OrderDate) = 1998 GROUP BY EmployeeID) AS InnerQuery --Now InnerQuery can be used like a relation in the outer query --Start adding non-aggregate attributes to the SELECT --EmployeeName requires a JOIN to the Employees table SELECT InnerQuery.EmployeeID, InnerQuery.NumberOfOrders, Employees.* FROM (SELECT EmployeeID, COUNT(*) as NumberOfOrders FROM Orders WHERE YEAR(OrderDate) = 1998 GROUP BY EmployeeID) AS InnerQuery, Employees WHERE Employees.EmployeeID=InnerQuery.EmployeeID --PK=FK SELECT Employees.LastName, InnerQuery.EmployeeID, InnerQuery.NumberOfOrders --, Employees.* FROM (SELECT EmployeeID, COUNT(*) as NumberOfOrders FROM Orders WHERE YEAR(OrderDate) = 1998 GROUP BY EmployeeID) AS InnerQuery, Employees WHERE Employees.EmployeeID=InnerQuery.EmployeeID --PK=FK SELECT LastName, employees.EmployeeID, COUNT(*) as NumberOfOrders FROM Orders, Employees WHERE Employees.EmployeeID=Orders.EmployeeID AND YEAR(OrderDate) = 1998 GROUP BY Employees.EmployeeID, Lastname /* Again, if we find that we are using the InnerQuery often, we can store its definition as a VIEW. */ /* Northwind is conducting an analysis of freight charges for specific product lines. In particular, they are interested in the beverages category. Display Shippers.CompanyName and average frieght by shipper for products in the beverages category. */ /* Step 1: construct inner query Step 2: insert inner query in an outer query Step 3: incrementally add attributes to outer query The inner query will compute AVG for orders by shippers for orders containing a product in the beverages category. */ --Have an exploratory look at the table first... SELECT TOP 500 Orders.* FROM Orders ORDER BY ShipVia --We'll want to compute an average freight by the FK ShipVia SELECT Shipvia, AVG(Freight) AS AverageFreight FROM Orders GROUP BY ShipVia /* Easy... Now we filter for Orders containing a beverage product. To do that, we'll either have to write some subqueries or JOIN Orders, [Order Details], Product, and Categories The rule of thumb: if a JOIN will serve, avoid subqueries Note, your results may differ because over time I have accidently modified my NorthWind database */ --Join to Order Details SELECT Shipvia, AVG(Freight) AS AverageFreight FROM Orders, [Order Details] AS OrderDetails WHERE Orders.OrderID=OrderDetails.OrderID GROUP BY ShipVia --Join to Products SELECT Shipvia, AVG(Freight) AS AverageFreight FROM Orders, [Order Details] AS OrderDetails, Products WHERE Orders.OrderID=OrderDetails.OrderID AND OrderDetails.ProductID=Products.ProductID GROUP BY ShipVia --Join to Categories SELECT Shipvia, AVG(Freight) AS AverageFreight FROM Orders, [Order Details] AS OrderDetails, Products, Categories WHERE Orders.OrderID=OrderDetails.OrderID AND OrderDetails.ProductID=Products.ProductID AND Categories.CategoryID=Products.CategoryID GROUP BY ShipVia --Restrict the rows to products in the beverages category SELECT Shipvia, AVG(Freight) AS AverageFreight FROM Orders, [Order Details] AS OrderDetails, Products, Categories WHERE Orders.OrderID=OrderDetails.OrderID AND OrderDetails.ProductID=Products.ProductID AND Categories.CategoryID=Products.CategoryID AND Categories.CategoryName = 'Beverages' GROUP BY ShipVia /* This correctness of this result is more difficult to verify! One way is to replace GROUP BY with ORDER BY and inspect values. */ SELECT Shipvia, Freight, Products.ProductName FROM Orders, [Order Details] AS OrderDetails, Products, Categories WHERE Orders.OrderID=OrderDetails.OrderID AND OrderDetails.ProductID=Products.ProductID AND Categories.CategoryID=Products.CategoryID AND Categories.CategoryName = 'Beverages' ORDER BY ShipVia /* I dropped the values for ShipVia 1 in Excel and got 81.58893443. Now we can begin to work on the outerquery. */ SELECT InnerQuery.Shipvia, InnerQuery.AverageFreight FROM (SELECT Shipvia, AVG(Freight) AS AverageFreight FROM Orders, [Order Details] AS OrderDetails, Products, Categories WHERE Orders.OrderID=OrderDetails.OrderID AND OrderDetails.ProductID=Products.ProductID AND Categories.CategoryID=Products.CategoryID AND Categories.CategoryName = 'Beverages' GROUP BY ShipVia) AS InnerQuery /* In this case, we need only JOIN InnerQuery to Shippers */ SELECT Shippers.CompanyName, InnerQuery.Shipvia, InnerQuery.AverageFreight FROM (SELECT Shipvia, AVG(Freight) AS AverageFreight FROM Orders, [Order Details] AS OrderDetails, Products, Categories WHERE Orders.OrderID=OrderDetails.OrderID AND OrderDetails.ProductID=Products.ProductID AND Categories.CategoryID=Products.CategoryID AND Categories.CategoryName = 'Beverages' GROUP BY ShipVia) AS InnerQuery, Shippers WHERE Shippers.ShipperID=InnerQuery.ShipVia /* Compare this to simply adding Shippers to the Join and CompanyName to the GROUP BY clause. */ SELECT Shippers.CompanyName, Shippers.Phone, Orders.Shipvia, AVG(Freight) AS AverageFreight FROM Orders, [Order Details] AS OrderDetails, Products, Categories, Shippers WHERE Orders.OrderID=OrderDetails.OrderID AND OrderDetails.ProductID=Products.ProductID AND Categories.CategoryID=Products.CategoryID AND Shippers.ShipperID=Orders.ShipVia AND Categories.CategoryName = 'Beverages' GROUP BY ShipVia, Shippers.CompanyName, Shippers.Phone SELECT Shippers.CompanyName, Shippers.Phone, Shipvia, AverageFreight FROM vwAverageFreightByShipperBeverages, Shippers WHERE Shippers.ShipperID=vwAverageFreightByShipperBeverages.ShipVia /* In this case, the subquery performed better and for me, from a reuse standpoint, the FROM clause subquery lends itself better to reuse, namely by saving InnerQuery as a VIEW. Then our query would become: */ --DROP VIEW vwAverageFreightByShipperBeverages CREATE VIEW vwAverageFreightByShipperBeverages AS SELECT Shipvia, AVG(Freight) AS AverageFreight FROM Orders, [Order Details] AS OrderDetails, Products, Categories WHERE Orders.OrderID=OrderDetails.OrderID AND OrderDetails.ProductID=Products.ProductID AND Categories.CategoryID=Products.CategoryID AND Categories.CategoryName = 'Beverages' GROUP BY ShipVia SELECT * FROM vwAverageFreightByShipperBeverages --Our query would then read: SELECT Shippers.CompanyName, Shippers.Phone, Shipvia, AverageFreight FROM vwAverageFreightByShipperBeverages, Shippers WHERE Shippers.ShipperID=vwAverageFreightByShipperBeverages.ShipVia /* The FROM clause can, of course, contain multiple uncorrelated subqueries. Just like you can JOIN multiple views. SELECT outer query attribute list, subquery attribute list FROM table1, table2..., (SELECT attribute1, aggregate FROM table3 WHERE condition list GROUP BY attribute1) AS InnerQuery1, (SELECT attribute2, aggregate FROM table4 WHERE condition list GROUP BY attribute2) AS InnerQuery2 WHERE table1.Pk=table2.Fk AND table1.Pk=InnerQuery1.Fk AND table2.Pk=InnerQuery2.Fk */ /* Several Northwind customers have complained that they suspect shipdate is correlated with the number of orders placed. That is, they suspect that customers who place more orders tend to get their orders shipped sooner. The shipping department just called and asked if you could provide a list of Customers.CompanyName, NumberOfOrders, AvgDaysToShip for all customers in the Nortwhind database */ --Have an exploratory look at the table first... --In this case, our point of reference is CustomerID SELECT TOP 500 Orders.* FROM Orders ORDER BY CustomerID /* We will need to construct two seperate inner queries. 1. Compute number of orders by customer 2. Computer number of average days to ship */ --Subquery to compute number of orders SELECT CustomerID, COUNT(*) AS NumberOfOrders FROM Orders GROUP BY CustomerID --Subquery to compute average days to ship SELECT CustomerID, AVG(DATEDIFF(day, OrderDate, ShippedDate)) AS AvgDAysToShip FROM Orders GROUP BY CustomerID --In both cases, we have included the PK for JOINing. /* Now we can begin to work on the outerquery. Add the OrderCount */ SELECT OrderCount.CustomerID, OrderCount.NumberOfOrders FROM (SELECT CustomerID, COUNT(*) AS NumberOfOrders FROM Orders GROUP BY CustomerID) AS OrderCount --Then the AverageDays to ship with a JOIN SELECT OrderCount.CustomerID, OrderCount.NumberOfOrders, DaysToShip.CustomerID, DaysToShip.AvgDAysToShip FROM (SELECT CustomerID, COUNT(*) AS NumberOfOrders FROM Orders GROUP BY CustomerID) AS OrderCount, (SELECT CustomerID, AVG(DATEDIFF(day, OrderDate, ShippedDate)) AS AvgDAysToShip FROM Orders GROUP BY CustomerID) AS DaysToShip WHERE OrderCount.CustomerID=DaysToShip.CustomerID --Now we have to JOIN to the CompanyName and remove unnecessary attributes SELECT Customers.CompanyName, OrderCount.NumberOfOrders, DaysToShip.AvgDAysToShip FROM (SELECT CustomerID, COUNT(*) AS NumberOfOrders FROM Orders GROUP BY CustomerID) AS OrderCount, (SELECT CustomerID, AVG(DATEDIFF(day, OrderDate, ShippedDate)) AS AvgDAysToShip FROM Orders GROUP BY CustomerID) DaysToShip, Customers WHERE Customers.CustomerID=OrderCount.CustomerID AND Customers.CustomerID=DaysToShip.CustomerID /* And the answer is? r = -0.039479791 */ /* Self-join Often I have found that I need to join a table to itself. A common application is looking for duplicate values. The same result can be achieved with GROUP BY (we'll see this in the normalization section). I use this technique to look for duplicate submissions during an exam. We'll use a similar technique in the DIVISION operation. SELECT attribute list FROM table1 AS T1, table1 AS T2 WHERE T1.Pk=T2.Pk */ /* Marketing is wondering Which products have the same UnitPrice? List UnitPrice and ProductName for all Products with the same UnitPrice. Remove duplicates. */ SELECT UnitPrice, ProductName FROM Products ORDER BY UnitPrice /* Scanning down the list, it looks like the first repeated unitprice is 9.50. We can get those rows where UnitPrice is equal by treating UnitPrice as a Pk=Fk comparison. We create two "virtual" copies of Products by giving it two distinct aliases in the FROM clause. */ SELECT P1.UnitPrice AS UnitPrice1, P1.ProductName AS ProductName1, P2.UnitPrice AS UnitPrice2, P2.ProductName AS ProductName2 FROM (SELECT * FROM Products) AS P1, (SELECT * FROM Products) AS P2 WHERE P1.UnitPrice=P2.UnitPrice ORDER BY P1.UnitPrice SELECT P1.UnitPrice AS UnitPrice1, P1.ProductName AS ProductName1, P2.UnitPrice AS UnitPrice2, P2.ProductName AS ProductName2 FROM Products AS P1, Products AS P2 WHERE P1.UnitPrice=P2.UnitPrice ORDER BY P1.UnitPrice /* That's not quite it. A product will always have the same price as itself! We need to specify that the Product.ProductID is different... */ SELECT P1.UnitPrice AS UnitPrice1, P1.ProductName AS ProductName1, P2.UnitPrice AS UnitPrice2, P2.ProductName AS ProductName2 FROM Products AS P1, Products AS P2 WHERE P1.ProductID <> P2.ProductID AND P1.UnitPrice=P2.UnitPrice ORDER BY P1.UnitPrice /* But that still isn't quite right. A JOIN so constructed will really create a Cartesian product. All combinations of each match. So notice that P1=P2 and P2=P1. We can solve that by only listing one column and removing the dulplicates. */ --Remove P2 attributes SELECT P1.UnitPrice, P1.ProductName FROM Products AS P1, Products AS P2 WHERE P1.ProductID <> P2.ProductID AND P1.UnitPrice=P2.UnitPrice ORDER BY P1.UnitPrice --Remove duplicates SELECT DISTINCT P1.UnitPrice, P1.ProductName FROM Products AS P1, Products AS P2 WHERE P1.ProductID <> P2.ProductID AND P1.UnitPrice=P2.UnitPrice ORDER BY P1.UnitPrice, P1.ProductName --Or formulated using a correlated subquery --These formulations were suggested by a query written by Curt Parker (Fall 2009) SELECT UnitPrice, ProductName FROM Products AS P1 WHERE UnitPrice IN (SELECT UnitPrice FROM Products AS P2 WHERE P1.ProductID <> P2.ProductID) ORDER BY UnitPrice, ProductName SELECT UnitPrice, ProductName FROM Products AS P1 WHERE EXISTS (SELECT * FROM Products AS P2 WHERE P1.ProductID <> P2.ProductID AND P1.UnitPrice=P2.UnitPrice) ORDER BY UnitPrice, ProductName