Introduction to Filtered Indexes

Filtered indexes were introduced in SQL Server 2008, providing us with the ability to save space and potentially improve performance. These indexes are nonclustered, meaning they essentially make a “copy” of our underlying data in a B-tree structure. The primary difference from creating a traditional nonclustered index, is the inclusion of a WHERE clause.

So, how can we create a filtered index?

As previously mentioned, we’ll use a WHERE clause in our CREATE NONCLUSTERED INDEX statement. Let’s pretend in this scenario that we have a web application with a “Users” table. Our users have the ability to delete their accounts, but we’ve implemented “Soft Deletes” for retention purposes. We determine 95% of the queries referencing that table are filtering on “IsDeleted = 0”. We also find that the “UserName” column is almost always being returned.

Let’s create dummy data.

CREATE TABLE dbo.Users
(
	UserID int IDENTITY NOT NULL
		CONSTRAINT PK_Users PRIMARY KEY CLUSTERED,
	UserName nvarchar (32) NOT NULL
		CONSTRAINT UQ_UserName UNIQUE,
	PasswordHash varbinary (MAX) NOT NULL,
	CreationDate datetime NOT NULL
		CONSTRAINT DF_Users_CreationDate DEFAULT CURRENT_TIMESTAMP,
	IsDeleted bit NOT NULL
		CONSTRAINT DF_Users_IsDeleted DEFAULT 0
);

INSERT INTO dbo.Users ([UserName], [PasswordHash])
SELECT TOP (50000) CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS nvarchar (32)),
                   CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS varbinary (MAX))
FROM sys.objects AS O1
CROSS JOIN sys.objects AS O2
CROSS JOIN sys.objects AS O3;

-- 30% (15,000) of these users shall be "Soft Deleted" randomly.
WITH CTE
AS
(
	SELECT TOP (30) PERCENT *
	FROM dbo.Users
	ORDER BY NEWID()
)
UPDATE CTE
SET IsDeleted = 1;

We’ve got a total of 50,000 rows now in our Users table. To ensure our “deletes” worked, we can execute the following query:

SELECT IsDeleted, 
       COUNT(1) AS TotalRows
FROM dbo.Users
GROUP BY IsDeleted;
IsDeletedTotalRows
035000
115000

Time for some comparisons.

Without adding any indexes at all, let’s look at how our query would perform. I’ll be turning STATISTICS IO and STATISTICS TIME ON in this session, so we’ll get some additional information.

SET STATISTICS IO, TIME ON;

The query we will be reusing in our scenario will be the following:

SELECT UserName
FROM dbo.Users
WHERE IsDeleted = 0;

As previously mentioned, no additional indexes have been created for the first execution.

(35000 rows affected)
Table 'Users'. Scan count 1, logical reads 284, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.


SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 163 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

Clippy gives us a Missing Index warning here. Implementing it should improve our query cost by 57.606%. So let’s try it.

CREATE NONCLUSTERED INDEX IX_IsDeleted_Includes
ON [dbo].[Users] ([IsDeleted])
INCLUDE ([UserName]);

After creating that index, we rerun our original query to discover things improved.

(35000 rows affected)
Table 'Users'. Scan count 1, logical reads 110, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 160 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

Clippy is somewhere smiling down at this execution plan.

The previous execution gave us a seek and reduced our logical reads considerably. There’s one last thing we can try.

Finally, the filtered index.

As mentioned above, to implement a filtered index we simply need to add a WHERE clause.

CREATE NONCLUSTERED INDEX IX_UserName ON dbo.Users (UserName) WHERE IsDeleted = 0;

If you didn’t drop that previous index, you’ll notice it’ll be a bit quicker to create this one.

The optimizer is going to stick with that first index we created unless we drop it or hint our filtered one. I’ll hint for it here. I normally wouldn’t condone hinting an index, because if that index ever gets dropped, you’re outta luck. You’ll see an angry 308 error that looks like this:
Msg 308, Level 16, State 1, Line 2
Index 'IndexName' on table 'dbo.TableName' (specified in the FROM clause) does not exist.

Let’s create that index.

SELECT UserName
FROM dbo.Users WITH (INDEX = IX_UserName_Filtered)
WHERE IsDeleted = 0;

Our results look a little bit better this time around.

(35000 rows affected)
Table 'Users'. Scan count 1, logical reads 104, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.


SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 159 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

We have an Index Scan this time around, which is okay, since we aren’t filtering to any specific UserName(s) which is the key in our filtered index. You’ll also see that our logical reads decreased a bit.

What did we learn?

We can get a general idea of the differences in storage required for these indexes using sys.dm_db_index_physical_stats and looking at the index leaf level. Paul Randal has an excellent article detailing the use of that DMV.

SELECT T.name AS TableName, 
       I.name AS IndexName, 
	   IPS.record_count,
	   IPS.min_record_size_in_bytes,
	   IPS.max_record_size_in_bytes,
	   IPS.avg_record_size_in_bytes
FROM sys.tables AS T
INNER JOIN sys.indexes AS I
	ON T.object_id = I.object_id
CROSS APPLY sys.dm_db_index_physical_stats (DB_ID(), T.object_id, I.index_id, NULL , N'DETAILED') AS IPS
WHERE t.object_id = OBJECT_ID(N'dbo.Users', N'U')
	AND IPS.index_level = 0;

The previous query yields the following results:

TableNameIndexNamerecord_countmin_record_size_in_bytesmax_record_size_in_bytesavg_record_size_in_bytes
UsersPK_Users50000364443.555
UsersUQ_UserName50000142221.555
UsersIX_IsDeleted_Includes50000152322.555
UsersIX_UserName_Filtered35000142221.556

We can see the size in bytes decreases slightly with the filtered index, since we’re no longer storing the extra bit column in that index. We’re not saving too much with that bit column.

As with anything associated with index design, we should always consider its cost versus benefit. Filtered indexes provide us a great way to reduce the storage required, reduce reads, and can even reduce the time of our index maintenance plans. There are certain instances where SQL Server won’t use our filtered indexes. One of the most common examples occurs when the filter predicate is parameterized.

I hope you learned something, or you at least had a nice refresher on how filtered indexes can be implemented! Thanks for reading!

Leave a Reply

Your email address will not be published. Required fields are marked *