SQL Exists Syntax

About this post: This post is designed to help you understand how to use the EXISTS keyword in your SQL query environment. All of the examples will be based on a fictional three table database for posting messages to an Internet message board. The examples will be generic so that they can apply to any database (Oracle, SQL Server, etc) that supports the EXISTS syntax.

As part of my job, I regularly deal with SQL coding. My current focus is on Oracle databases, but I occasionally get to play in DB2 and other platforms. I recently encountered some SQL code that I was not familiar with. In the WHERE clause of the code was an EXISTS statement with a sub query listed inside. It looked something like the following

SELECT 	*
FROM	Users U
WHERE 	ID EXISTS (SELECT NULL
		                  FROM PostRating PR
		                  WHERE PR.UserID = U.ID
		                  AND MarkedAsSpam >= 5)

Hard as I tried, I could not make sense of the query listed above. Why would you select NULL from the the table and what does it mean. In order to explore this further, we need to have a simple problem to solve.

First, there will be two database tables for this example. I’m going to write generic syntax to get the point across. You’ll have to modify this to your platform. The tables only have the columns that are relevant to the example. Were this a production table, they would have many more columns.

 

CREATE TABLE Users AS
ID          Number
Name     String

INSERT INTO Users (ID, Name) VALUES (1, 'Nice Guy')
INSERT INTO Users (ID, Name) VALUES (2, 'Bad Guy')

CREATE TABLE PostRating AS
UserID           Number
PostID           Number
SpamVotes    Number

INSERT INTO PostRating (UserID, PostID, SpamVotes) VALUES (1, 1, 1)
INSERT INTO PostRating (UserID, PostID, SpamVotes) VALUES (2, 2, 20)

 

For the tables above, the Users table lists the valid users in the system. PostRatings table is used for people on the message board to mark a message as SPAM. Using the community to mark messages as spam can help clean up an open message board from unwanted content.

Before digging into the original query above, lets explore with some basic testing how EXISTS works.

 

--Returns 2 rows.
SELECT *
FROM Users

--Still Returns 2 rows
SELECT *
FROM   Users
WHERE  EXISTS(SELECT null
                          FROM PostRating)
             
--Returns 0 rows. 
--Note that the sub query returns zero rows.      
SELECT *
FROM   Users
WHERE  EXISTS(SELECT null
                          FROM   PostRating
	                  UserID = 999)  

--Returns 2 rows. 
--Note that the sub query returns 1 rows.      
SELECT *
FROM   Users
WHERE  EXISTS(SELECT null
                          FROM   PostRating
	                  WHERE  UserID = 1)


--Still return  2 rows  
--The sub query if for an Oracle database, but it works
--the same for any table that has rows in it.             
SELECT *
FROM   Users
WHERE  EXISTS (SELECT null
                           FROM dual)
                    
--Blows up, must be a query inside of the parentheses.                    
SELECT *
FROM   Users
WHERE  EXISTS (NULL)

 

Hopefully at this point, you have come to the same conclusion that I did after all the testing. If the query listed inside the EXISTS returns anything [including NULL] then the statement is true. If no records come back, then the record is false and don’t return anything in the whole query.

Now you might be asking why anybody would use this syntax. The power of this statement comes when you join the sub query of the EXISTS to the main query. You can quickly exclude records that don’t meet the criteria. In the original example above, I wanted to list all users that have had one or more posts listed as spam 5 or more times.

SELECT 	*
FROM	Users U
WHERE 	ID EXISTS (SELECT NULL
		                  FROM PostRating PR
		                  WHERE PR.UserID = U.ID
		                  AND MarkedAsSpam >= 5)

By joining the Users Table to the PostRating table in the WHERE statement of the sub query inside the Exists, I’m essentially asking, “Does this user have their MarkedAsSpam for any posted listed 5 or more times?” The result is that only 1 record will return for the ‘Bad Guy’ user, but it will ignore ‘Good Guy’ as their count is only at 1.

This would be great for an admin report to help know which users might be removed from the system for abuse, and it protects the good users are not abusing the system.

One final thought. Upon my initial research for EXISTS I could only thing that there are other ways to write the same query and get the result you’re looking for. However, I inherited a large code base where the developers preferred the EXISTS syntax. I don’t have the time to re-write and test working code, so I have learned to work with it. Now that I understand it, I have added it to my skills and even find it quite useful!

Let me know if you have questions or comments.

Hogan Haake

Oracle Sequence Promotes Poorly Maintainable Code

I started my professional career working with Microsoft’s SQL Server. I spent twelve years off and on learning how to design a database and write stored procedures in T-SQL. Then this last October, I switched jobs and was exposed to a new database platform, Oracle. Since this switch, I have used every curse word I know and invented new ones to express my frustration at interacting with an Oracle 10x something database. I’ll leave the rest of my rantings for another post and just focus on one aspect of Oracle that has frustrated me recently.

I started creating my first new table in Oracle and started defining the columns. I always start with an ID column that is typically used as primary key of the table. As I went to select the column type, I didn’t see anything labeled “autonumber”. Trying again, I looked for integer, but that isn’t there either. Oracle only supports the “Number” column. There, you can provide the precision before and after the decimal point. After selecting the number column, I looked all over for something that would mark the column as unique and set for an autonumber sequence. Striking out  quickly, it was time to ask Google and start learning about Sequence objects.

Oracle tables have no built in mechanism for auto numbering. Instead, you must create a separate unique Sequence object and use it each time a record is inserted into the database.

CREATE SEQUENCE customers_seq START WITH 1000 INCREMENT BY 1 NOCACHE NOCYCLE;

Then each time the sequence is used, it looks something like

INSERT INTO customers (ID, Name…) VALUES (customers_seq.nextval, ‘Hogan Haake’…);

Comparing this to the SQL Server I’m used to, if a column is autonumbered, you just exclude it in the insert and it automatically gets the next ID on insert.

INSERT INTO customers(Name, …) VALUES (‘Hogan Haake’…)

At this point, Oracle people could argue that I’m just lazy, or I just need to learn a new way. They are right on both accounts, but there is more to the story! I recently came across some bad code in part of my application where the developer didn’t use the sequence.nextvalue for an insert, instead converting the current date into a number [YYMMDD Format] and inserted that into the table as a unique value. While that method worked, the unique number they were generating was quite far away from the current sequence. The system has been in production for two years now and the sequence number is about 6 months away from a “collision” with incorrectly inserted manual numbers in the ID column.

Current Sequence Value           Manual Sequence Value
107,000                                      120,210     (first inserted 2012-Feb-10th)

The current sequence value is fast approaching the first manual sequence value. It was fortunate that the bug was found before it caused corrupt data and long nights for me. Due to the complexity of the system and time constraints, the simple fix of  incrementing the next value of the sequence to 500,000 to avoid any future collisions with “unique” numbers was chosen. It would be nice to fix the offending code with the correct sequence number, but management decided the code worked enough that we could move on to other problems.

In a SQL server environment, if you try to insert a value into an autonumber field, an error is produced preventing this type of error from happening.

I’m not sure what other issues I’m going to encounter with this new environment, but I sure miss SQL Server. If you still don’t think SQL server is better, consider community support. Who would you rather trust for help?

Pinal Dave (Sql Server) or Don Burleson (Oracle)?

Hogan

Blog Engine .NET Tag Cloud Optimization

For all of the amazing blogs and websites that are out there, hundreds exist that are just average. Most of us are not experts, like Eric Lippert, who can write with authority on a single topic for years and have interesting and new things to say. Instead, there are blogs like mine. I write about whatever is interesting to me at the time. As the years progress, what I write about changes, and I want my website [Blog Engine .NET] to reflect that.

As I create new posts, they are shown on the front page till they get old. However, I have found that the default Tag Cloud widget is only interested in what I have written about most. For example, I took a motorcycle trip with my brother, so I wrote extensively about it. Since that trip almost two years ago, Arkansas was the top item on my Tag Cloud. I have nothing against that wonderful state, but I felt that it made my website out of date.

I have made a simple one line code change to the Tag Cloud that makes it more relevant. Instead of using tags from all 190 posts on my site from the last 2 years, I changed it to only consider tags from posts that are less than 1 year old. After this change was made, the Tag Cloud immediately updated and more accurately reflected what I have been writing about recently.

Now for the details, I’ll assume you’re familiar with Blog Engine .NET:

1. This example is using version 2.0, I have not tested it with other versions.
2. Go to the file (from the root of the source code download) BlogEngine.NET\widgets\Tag cloud\widget.ascx.cs
3. Using your favorite text editor (because we don’t need to compile anything) go to line 197.
  a. Method private static SortedDictionary<string, int> CreateRawList() method
      The old method

foreach (var tag in Post.Posts.Where(post => post.IsVisibleToPublic

With a minor modification highlighted, we can adjust the time to just the last year’s worth of posts.

4. With the minor change in the method, save the file, and upload it to the same relative path on your Blog Engine .NET installation and next time you call a page with the Tag Cloud on it, it will automatically re-compile and the cloud will be up to date.

Sorry that I used images instead of text, but I wanted the color to come out nice. If you’re nervous to make the code change yourself, you can download the one file to install yourself.

widget.ascx.zip (1.84 kb)

Happy Coding!

Hogan Haake

Where Did Part Of My File Go?

I’ve primarily been a Windows software developer for the last 12 years. During that time, I’ve written lots of web sites, desktop applications, and server applications. I recently changed jobs and am now working mainly on an IBM mainframe and Unix. I occasionally get to do some Windows applications, but they are few and far between. Lucky for me, today was one of the days when I got to work on a Windows application. It was a simple job moving a file from a server, doing some minor processing and then FTPing it to a Unix server for final processing.

In my attempt to be a good developer, I spent a significant portion of my time on the application testing and placing try catches to make the application as safe as possible from any issues. As part of the safety I decided to validate that each file I FTPed to the Unix server did indeed get there. After I finished the FTP, I did a directory listing “ls -l” equivalent to “dir” in DOS. The listing came back with all the files in the directory and their size.

I wrote a loop to compare the file sizes between the Windows files on the local machine, and the Unix files transferred. To my surprise, none of the files I transferred were the same size! I was perplexed by this. After downloading the plain text .csv file from the Unix server to a different folder, I checked the properties, and it was indeed smaller that the original file uploaded. I tried to open it figuring that it would fail, but to my surprise, it opened correctly. I looked at a sampling of the lines in the file and they had the same data.

I was quite frustrated at this and Googled around looking for answers. There were many posts about encoding of files being different. Research in this realm brought me no closer to the answer. The files both appeared to be encoded ANSI. I decided to resort to the lowest level of debugging I could think of. I downloaded a hex editor to look inside the files and see what they had for data. There in the hex, I compared file next to file and found the issue that has been eluding me for a while.

Windows terminates its lines with CHR(13) CHR(10) [Carriage Return, Line Feed]
Unix terminates its lines with CHR(10) only [Line Feed].

I was loosing one byte of size per new line in the file I uploaded. It seems that Unix converted my file upon FTP upload. When downloaded and tested on my Windows machine, the smaller file was able to be processed by Microsoft Excel opening up the .csv file. This explains why the file size was smaller, but all of the data was still assessable.

Windows Hex Output. Note the highlighted square “OD”. That is hex 0x0D chr(13) for carriage return. It is followed by 0x0A chr(10) for new line.

Note the Unix file below in the same position only has the 0x0A chr(10) for newline, but the carriage return has been stripped out. This accounts for the file size difference.

In order to continue my quest for few errors and validation of my data, I wrote a method I’ll share with you to get the Unix file size.

/// Get the size of a file on a Unix system. This entails counting all of the
///CHR(13) charcters and subtracting that from the overall
/// size as Unix doesn’t use them in a newline. Stupid Unix!
/// </summary>
/// <returns></returns>
private static long GetUnixFileSize(string fileName)
{
    long windowsFileSize = new FileInfo(fileName).Length;
    long unixFileSize = windowsFileSize;

    using (StreamReader sr = new StreamReader(fileName, true))
    {
        char[] c = new char[1];
        const char char13 = (char)13;
        while (sr.Read(c, 0, 1) == 1)
        {
            //If we find a Carriage Return, decrement the count.
            if (c[0] == char13)
                unixFileSize–;
        }//while (sr.Read(c, 1, 1))
    }//using (StreamReader sr = new StreamReader(fileName, true))

    return unixFileSize;
}

 

Hope this saves some of you time!

Hogan

Getting rid of SPAM comments in BlogEngine.NET

I have been running this website on BlogEngine.NET version 2.0.0.36 for over a year now. I like the interface and the familiar .NET coding environment. One of the biggest drawbacks to this blog software is the proliferation of SPAM comments that are entered on my site. I began to take for granted that I would never get rid of them. Every a few days, I got notified that there was a new comment on a post I made. Nearly every post was SPAM. I decided it was time to fight back. I didn’t want to loose control of the comments to a third party, or invent/borrow my own captcha system. Some day it may be necessary, but I don’t particularly like them. When I looked at my web logs for the site, I noticed that there were several searches for “powered by blogengine.net” and then some random term.

And they found my page because the footer of the page looked like the following

This got me thinking. I’m using the standard theme (Indigo) provided during installation of BlogEngine.net. I decided to try modifying the master page for this theme. Once I modified the theme, I have stopped getting SPAM comments to my site. If you’re interested, you should do the same.

1. Download the file /themes/Indigo/site.master
2. Change the text on line 64 FROM

<div class=”footer”>
        Powered by <a href=”http://www.dotnetblogengine.net/” target=”_blank”>BlogEngine.NET</a> <%=BlogSettings.Instance.Version() %> |
        Original Design by <a href=”http://arcsin.se”>Arcsin</a>, Adapted by <a href=”http://www.nyveldt.com/blog/”>RazorAnt</a>
</div>

To something like

<div class=”footer”>
    Thanks for visiting Snorkie.com
</div>

Then re-upload the site.master page and your site will have something like

Give it a try, but you might not be happy with the results. Now that I changed the text, I don’t have any comments on my entries. Now to get more traffic to my site!

Hogan