‘Parameter sniffing’ problem and how to fix it

‘Parameter sniffing’ problem and how to fix it

In my last post ‘Query plan and statistics’ I briefly touched on a problem called ‘Parameter sniffing’. In this post, we’ll discuss this problem a little bit deeper. If you didn’t read the last post, please do it now, because the current one is logically connected with it.

I am using the SQL Server 2019; the database compatibility mode is 150(2019). As a sample database, I am using the AdventureWorks2019 database from Microsoft, but it is not necessary because all the tables I’ll create from the scratch.

First of all, let’s create a table and populate it with data. Then create a clustered index and a non-clustered index to support the query that we’re going to use later.

USE [AdventureWorks2019]
GO
/*create a table and fill it up*/
DROP TABLE IF EXISTS dbo.statistics_simple_test; 
CREATE TABLE dbo.statistics_simple_test (id INT IDENTITY(1,1),a_text NVARCHAR(1000), a_column INT);
GO
INSERT INTO statistics_simple_test(a_text,a_column)
SELECT TOP 1000000 m1.text,1 
FROM sys.messages AS m1 CROSS JOIN sys.messages AS m2;
GO
CREATE CLUSTERED INDEX clx_statistics_simple_test ON statistics_simple_test ([id] ASC);
GO
CREATE NONCLUSTERED INDEX ix_statistics_simple_test_bad_parameter ON statistics_simple_test (a_column);
GO

Now we need to update one row and then update statistics.

/*update one row and update the statistics */
UPDATE dbo.statistics_simple_test SET a_column = Id WHERE Id = 2;
UPDATE STATISTICS dbo.statistics_simple_test WITH FULLSCAN,ALL;
GO

What we did is just changed a value of ‘a_column’ in one row where Id = 2. Now the data distribution in the column ‘a_column’ looks like this:

/*now the data looks like*/
SELECT a_column,COUNT(*) as rows_count 
FROM dbo.statistics_simple_test
GROUP BY a_column

Let’s recall what is the problem with this data distribution. The problem is that the data is distributed extremely unevenly and that causes a problem when we use a stored procedure to query the table using ‘a_column’ as a filter.

/*recall what is the problem*/
/*create a SP*/
DROP PROCEDURE IF EXISTS dbo.usp_select_rows;
GO
CREATE PROCEDURE dbo.usp_select_rows @a_column INT
AS
SELECT * FROM statistics_simple_test
WHERE a_column = @a_column
GO
/*execute SP with two different parameters*/
EXEC usp_select_rows 1;
GO
EXEC usp_select_rows 2;
GO

The query plan is good for the first query, to get 999,999 rows out of 1,000,000 ‘Clustered index scan’ is the best option. The query execution plan for the second query is the same. SQL Server created the plan based on the first parameter and reused it, not only the ‘index scan’ operator but also the ‘estimated number of rows’ 999,999. Of course, reading (scanning) 1,000,000 rows to get only one row is far from the optimal plan. Non-clustered index ‘ix_statistics_simple_test_bad_parameter’ (please see above) was created to support these kinds of queries (select a small number of rows), but it wasn’t used.

Such a problem is called ‘parameter sniffing’. It happens when the first parameter used by a stored procedure (SP) defines the query execution plan that SQL Server will use for all other parameters for the same SP even if the query plan is not good for other parameters. In other words, SQL Server compiles and saves a query plan for an SP when the SP is called for the first time or when recompilation is forced.

Now let’s check some ways to fix the ‘parameter sniffing’ problem. In general, an idea is simple. If compiled query plan doesn’t fit a current parameter and causes performance overhead it is better to use another query plan.

Solution #1. Clean procedural cache where SQL Server stored compiled query plans.

DBCC FREEPROCCACHE cleans the cache and forces SQL Server to compile a new query plan when SP is executed.

Of course, you shouldn’t use DBCC FREEPROCCACHE without a parameter (check the link for more information) in a production environment! I do it only because this is my test environment and I don’t care about any other query plans.

/*problem solutions*/
/*#1*/
DBCC FREEPROCCACHE;
EXEC usp_select_rows 1;
GO
DBCC FREEPROCCACHE;
EXEC usp_select_rows 2;
GO

It helps. When compiled query plan is not found, SQL Server generates a new one. ‘Scan’ for 999,999 rows, ‘Seek’ plus ‘Key lookup’ for 1 row. This approach can help if you need to fix the problem immediately. It doesn’t look to me like a permanent solution for most of the cases.

As a variation of the above solution ‘sp_recompile’ SP can be used.

/*#1.1*/
EXEC sp_recompile N'dbo.usp_select_rows'; 
GO

As a result of execution, no marks were added to the SP, it just deletes the query plan from the query plan cache.

Solution #2. Recompile when SP is called.

Using the code below we can force the SP recompilation when executing it.

/#2/
EXEC usp_select_rows 1 WITH RECOMPILE;
GO
EXEC usp_select_rows 2 WITH RECOMPILE;
GO

It gives correct execution plans in both cases. Behind the scene ‘WITH RECOMPILE’ option doesn’t change the query plan cache, the newly compiled plan doesn’t cache after executing. It is very useful if, for example, only a few parameter values need another query plan. Let’s say that parameter ‘1’ is used in most cases. If so, the query plan for parameter ‘1’ may be cached and doesn’t need recompilation. Rarely used parameter ‘2’ would need to use ‘WITH RECOMPILE’ option.

After clearing the cache, we may change our query to:

EXEC usp_select_rows 1 --WITH RECOMPILE;
GO
EXEC usp_select_rows 2 WITH RECOMPILE;
GO

Now doesn’t matter how many times you execute queries, if you execute them together or separately, or in another order, you’ll always get the correct query plan. The query plan for parameter ‘1’ is used from the cache, the query plan for parameter ‘2’ compiles new each time.

Solution #3. Recreate the SP with RECOMPILE.

DROP PROCEDURE IF EXISTS dbo.usp_select_rows;
GO
CREATE PROCEDURE dbo.usp_select_rows @a_column INT
AS
SELECT * FROM statistics_simple_test
WHERE a_column = @a_column
OPTION (RECOMPILE) /*recompile each time*/
GO
/**/
EXEC usp_select_rows 1;
GO
EXEC usp_select_rows 2;
GO

After changing the SP based on the code above, the SP recompiles every execution. This is a good solution when it is impossible to change the code how SP is called. As a drawback – CPU consumption may be high due to a big number of recompilations. The method is good as a temporary solution or when it is not overused.

Solution #4. Optimize the SP only for one parameter’s value.

We can optimize the SP only for one parameter, for example, ‘2’.

/*#4*/
/*optimize the SP for one parameter's value */
DROP PROCEDURE IF EXISTS dbo.usp_select_rows;
GO
CREATE PROCEDURE dbo.usp_select_rows @a_column INT
AS
SELECT * FROM statistics_simple_test
WHERE a_column = @a_column OPTION(OPTIMIZE FOR (@a_column = 2))
GO
/**/
DBCC FREEPROCCACHE;
EXEC usp_select_rows 1;
GO
DBCC FREEPROCCACHE;
EXEC usp_select_rows 2;
GO

Despite the fact that we executed the SP with parameter ‘1’ first, the query plan is optimized for parameter ‘2’. You can see it in the screenshot below. 999,999 seeks and lookups which took 13 seconds to complete!

Use this feature very carefully as you can make the performance even worse. You need to be sure which parameter is used in most of the cases and revisit your code from time to time.

Solution #5. Optimize for unknown.

This is similar to Solution #4. This time we don’t know what value can be used and ask SQL Server to predict it. Of course, SQL Server cannot predict it well so don’t expect an ideal result from this option.

/*#5*/
/*re-create the SP*/
DROP PROCEDURE IF EXISTS dbo.usp_select_rows;
GO
CREATE PROCEDURE dbo.usp_select_rows @a_column INT
AS
SELECT * FROM statistics_simple_test
WHERE a_column = @a_column OPTION(OPTIMIZE FOR UNKNOWN)
GO
/**/
DBCC FREEPROCCACHE;
EXEC usp_select_rows 1;
GO
DBCC FREEPROCCACHE;
EXEC usp_select_rows 2;
GO

SQL Server knows that only two different values in the ‘a_column’ (from the statistics) so it decided to estimate 500,000 rows for each query.

It works well for parameter ‘1’ and doesn’t work well for parameter ‘2’. It may work in some cases, but I’d recommend spending a little bit more time on investigation to find a better solution.

Solution #6. Dynamic query.

/*#6 dynamic query*/
/*re-create the SP*/
DROP PROCEDURE IF EXISTS dbo.usp_select_rows;
GO
CREATE PROCEDURE dbo.usp_select_rows @a_column INT
AS
declare @sql_text nvarchar(4000) = 
	'SELECT * FROM statistics_simple_test WHERE a_column = @a_column';
set @sql_text = replace(@sql_text,'@a_column',@a_column);
exec (@sql_text);
GO
/**/
DBCC FREEPROCCACHE;
EXEC usp_select_rows 1;
EXEC usp_select_rows 2;
GO

It works fine as well just like a regular query. For each parameter, SQL Server creates a query plan. Of course, the more a number of possible values the more memory is needed to store the query plans. Works well for only a few parameters though.

Conclusion.

The ‘parameter sniffing’ problem is very complex and it is impossible to review all options and scenarios in one post. All I wanted is to provide you with some possible ways you can try fixing it. Each of the mentioned above solutions can be used in particular cases. Always thoroughly test your solutions before deploying to production and, when deployed to production, monitor them from time to time. Data distribution, data size, and available indexes may make your solution not optimal.

Hope it was helpful, see you in the next one!