SQL Snapshot – IO slowness for the source DB and how to fix it

SQL Snapshot – IO slowness for the source DB and how to fix it

SQL Server Snapshots can be very useful for reports or other cases where read-only consistent data is needed. At the same time, there is an IO overhead for the source database. More information can be found here. It is not a goal for this post to explain how Snapshot works in SQL Server, so if you are not familiar with the snapshots, please take some time to check the online documentation.
I want to tell you about how an additional IO for the source database can be mitigated. Probably, the solution is not perfect but it helped me, and I hope someone can find it useful as well.
I used a ‘StackOverflow2010’ database for my scripts. I downloaded it from here, but the original copy is here.
The first thing I want to do is to create a snapshot of the StackOverflow2010 database. Change the file path to the snapshot accordingly. A file name for the snapshot file can be changed as well.

/*create a snapshot*/
CREATE DATABASE StackOverflow2010_Snapshot 
ON (NAME = stackoverflow2010, FILENAME = 'd:\SQLServer\SO2010_snapshot.ss') 
AS SNAPSHOT OF StackOverflow2010;

As a result of creating a snapshot, we can see the file ‘d:\SQLServer\SO2010_snapshot.ss’ created in the file system. Although you can see the file size in the file system, actually, the file is empty, and disk space is not allocated yet. Also, you can see the snapshot in the SSMS package in the list of snapshots.
Now I want to update one of the columns to the same value in the source database.

USE StackOverflow2010
UPDATE dbo.Posts SET AnswerCount = AnswerCount;

It was completed in 10m31s in my VM. The meaning of this meaningless operation I will explain a little bit later. But remember the run time.
Now I want to execute the same query one more time:

USE StackOverflow2010
UPDATE dbo.Posts SET AnswerCount = AnswerCount;

Now it takes 2m29s to complete! You can think about that table now in the memory buffer, and that’s why it is faster, but it is not completely true.
Let’s recreate a snapshot one time.

/* drop snapshot */
DROP DATABASE StackOverflow2010_Snapshot;

/*create a snapshot*/
CREATE DATABASE StackOverflow2010_Snapshot 
ON (NAME = stackoverflow2010, FILENAME = 'd:\SQLServer\SO2010_snapshot.ss') 
AS SNAPSHOT OF StackOverflow2010;

And execute UPDATE statement again.

USE StackOverflow2010
UPDATE dbo.Posts SET AnswerCount = AnswerCount;

And again, it is slow – 10m58s.
So, what is happening? If you still didn’t check the link I placed above; it is time. A Snapshot file is empty at the moment of creation, and it stays empty until data is starting to change in the source DB. But the snapshot data should be in the same state as it was at the time of creation. To provide it SQL Server copy the data pages changing in the source DB to the snapshot file before changing. At this moment, we can see the IO overhead, and the updates are slow.
If you execute the above query again, you will see that it is fast again because you are updating the same data pages that already were copied to the snapshot file. No additional IO!
It is not only one way to achieve the same result. If you don’t want to have IO overhead in your database after you created a snapshot, you can rebuild all (or some) tables for which you expect updates or deletes and don’t want this operation to be slow for the first time. It will make updates faster from the first time!

See you in the next one!