SQL Exists Syntax

About this post: This post is designed to help you understand how to use the EXISTS keyword in your SQL query environment. All of the examples will be based on a fictional three table database for posting messages to an Internet message board. The examples will be generic so that they can apply to any database (Oracle, SQL Server, etc) that supports the EXISTS syntax.

As part of my job, I regularly deal with SQL coding. My current focus is on Oracle databases, but I occasionally get to play in DB2 and other platforms. I recently encountered some SQL code that I was not familiar with. In the WHERE clause of the code was an EXISTS statement with a sub query listed inside. It looked something like the following

SELECT 	*
FROM	Users U
WHERE 	ID EXISTS (SELECT NULL
		                  FROM PostRating PR
		                  WHERE PR.UserID = U.ID
		                  AND MarkedAsSpam >= 5)

Hard as I tried, I could not make sense of the query listed above. Why would you select NULL from the the table and what does it mean. In order to explore this further, we need to have a simple problem to solve.

First, there will be two database tables for this example. I’m going to write generic syntax to get the point across. You’ll have to modify this to your platform. The tables only have the columns that are relevant to the example. Were this a production table, they would have many more columns.

 

CREATE TABLE Users AS
ID          Number
Name     String

INSERT INTO Users (ID, Name) VALUES (1, 'Nice Guy')
INSERT INTO Users (ID, Name) VALUES (2, 'Bad Guy')

CREATE TABLE PostRating AS
UserID           Number
PostID           Number
SpamVotes    Number

INSERT INTO PostRating (UserID, PostID, SpamVotes) VALUES (1, 1, 1)
INSERT INTO PostRating (UserID, PostID, SpamVotes) VALUES (2, 2, 20)

 

For the tables above, the Users table lists the valid users in the system. PostRatings table is used for people on the message board to mark a message as SPAM. Using the community to mark messages as spam can help clean up an open message board from unwanted content.

Before digging into the original query above, lets explore with some basic testing how EXISTS works.

 

--Returns 2 rows.
SELECT *
FROM Users

--Still Returns 2 rows
SELECT *
FROM   Users
WHERE  EXISTS(SELECT null
                          FROM PostRating)
             
--Returns 0 rows. 
--Note that the sub query returns zero rows.      
SELECT *
FROM   Users
WHERE  EXISTS(SELECT null
                          FROM   PostRating
	                  UserID = 999)  

--Returns 2 rows. 
--Note that the sub query returns 1 rows.      
SELECT *
FROM   Users
WHERE  EXISTS(SELECT null
                          FROM   PostRating
	                  WHERE  UserID = 1)


--Still return  2 rows  
--The sub query if for an Oracle database, but it works
--the same for any table that has rows in it.             
SELECT *
FROM   Users
WHERE  EXISTS (SELECT null
                           FROM dual)
                    
--Blows up, must be a query inside of the parentheses.                    
SELECT *
FROM   Users
WHERE  EXISTS (NULL)

 

Hopefully at this point, you have come to the same conclusion that I did after all the testing. If the query listed inside the EXISTS returns anything [including NULL] then the statement is true. If no records come back, then the record is false and don’t return anything in the whole query.

Now you might be asking why anybody would use this syntax. The power of this statement comes when you join the sub query of the EXISTS to the main query. You can quickly exclude records that don’t meet the criteria. In the original example above, I wanted to list all users that have had one or more posts listed as spam 5 or more times.

SELECT 	*
FROM	Users U
WHERE 	ID EXISTS (SELECT NULL
		                  FROM PostRating PR
		                  WHERE PR.UserID = U.ID
		                  AND MarkedAsSpam >= 5)

By joining the Users Table to the PostRating table in the WHERE statement of the sub query inside the Exists, I’m essentially asking, “Does this user have their MarkedAsSpam for any posted listed 5 or more times?” The result is that only 1 record will return for the ‘Bad Guy’ user, but it will ignore ‘Good Guy’ as their count is only at 1.

This would be great for an admin report to help know which users might be removed from the system for abuse, and it protects the good users are not abusing the system.

One final thought. Upon my initial research for EXISTS I could only thing that there are other ways to write the same query and get the result you’re looking for. However, I inherited a large code base where the developers preferred the EXISTS syntax. I don’t have the time to re-write and test working code, so I have learned to work with it. Now that I understand it, I have added it to my skills and even find it quite useful!

Let me know if you have questions or comments.

Hogan Haake

Oracle Sequence Promotes Poorly Maintainable Code

I started my professional career working with Microsoft’s SQL Server. I spent twelve years off and on learning how to design a database and write stored procedures in T-SQL. Then this last October, I switched jobs and was exposed to a new database platform, Oracle. Since this switch, I have used every curse word I know and invented new ones to express my frustration at interacting with an Oracle 10x something database. I’ll leave the rest of my rantings for another post and just focus on one aspect of Oracle that has frustrated me recently.

I started creating my first new table in Oracle and started defining the columns. I always start with an ID column that is typically used as primary key of the table. As I went to select the column type, I didn’t see anything labeled “autonumber”. Trying again, I looked for integer, but that isn’t there either. Oracle only supports the “Number” column. There, you can provide the precision before and after the decimal point. After selecting the number column, I looked all over for something that would mark the column as unique and set for an autonumber sequence. Striking out  quickly, it was time to ask Google and start learning about Sequence objects.

Oracle tables have no built in mechanism for auto numbering. Instead, you must create a separate unique Sequence object and use it each time a record is inserted into the database.

CREATE SEQUENCE customers_seq START WITH 1000 INCREMENT BY 1 NOCACHE NOCYCLE;

Then each time the sequence is used, it looks something like

INSERT INTO customers (ID, Name…) VALUES (customers_seq.nextval, ‘Hogan Haake’…);

Comparing this to the SQL Server I’m used to, if a column is autonumbered, you just exclude it in the insert and it automatically gets the next ID on insert.

INSERT INTO customers(Name, …) VALUES (‘Hogan Haake’…)

At this point, Oracle people could argue that I’m just lazy, or I just need to learn a new way. They are right on both accounts, but there is more to the story! I recently came across some bad code in part of my application where the developer didn’t use the sequence.nextvalue for an insert, instead converting the current date into a number [YYMMDD Format] and inserted that into the table as a unique value. While that method worked, the unique number they were generating was quite far away from the current sequence. The system has been in production for two years now and the sequence number is about 6 months away from a “collision” with incorrectly inserted manual numbers in the ID column.

Current Sequence Value           Manual Sequence Value
107,000                                      120,210     (first inserted 2012-Feb-10th)

The current sequence value is fast approaching the first manual sequence value. It was fortunate that the bug was found before it caused corrupt data and long nights for me. Due to the complexity of the system and time constraints, the simple fix of  incrementing the next value of the sequence to 500,000 to avoid any future collisions with “unique” numbers was chosen. It would be nice to fix the offending code with the correct sequence number, but management decided the code worked enough that we could move on to other problems.

In a SQL server environment, if you try to insert a value into an autonumber field, an error is produced preventing this type of error from happening.

I’m not sure what other issues I’m going to encounter with this new environment, but I sure miss SQL Server. If you still don’t think SQL server is better, consider community support. Who would you rather trust for help?

Pinal Dave (Sql Server) or Don Burleson (Oracle)?

Hogan