The ins and outs of SQL Server Stretch Database

The ins and outs of SQL Server Stretch Database

Part 1 – Stretch Database – Tell me more

“Cold Data” management is definitely not a new topic, but Stretch DB presents a solution to this problem with a higher level of simplicity and efficiency.

Normally you would have to do at least one (if not all) of the following to manage your organisation’s cold data growth:

  • Re-design your database and filegroup structure
  • Create partitions
  • Develop ETL packages to move data around
  • Reserve a maintenance window so your ETL can take place
  • Change the application’s code, so it caters for your cold data’s new repository
  • Spend a lot $$$ on a big brand out-of-the box solution

With SQL 2016, Microsoft has introduced Stretch Database which gives us full flexibility to host data that meets a specific criteria in Azure, and keep it online and accessible without the need for code change.

Stretch Database Benefits

  • Flexibility: Through the use of predicates it is easy to define what data is labelled ‘Cold’ and dynamically host it in Azure
  • Transparency: Since you don’t have to change a single line of code. Both Hot and Cold data will appear as ‘local’ on the application’s eye (Latency does play a part when accessing the remote data).
  • Maintenance Streamline: Since you have now moved 80% of your data out of your data centre, you only have to worry about the remaining 20% for tasks such as backups, restores, index maintenance, integrity checks and all of a sudden you are now compliant with your tight and expensive SLAs. Microsoft will backup and maintain your data for you, aren’t they nice!
  • Cost Effective Scalability: Since your company’s growth = growth in number of transactions = growth in database size = IT having to worry about ramping up storage capacity, now IT managers and DBAs can also celebrate the company’s success because the data growth can be scaled up effortlessly in Azure. Compared to upgrading your SAN or buying new all-flash disks, this makes Azure data-at-rest look as cheap as chips.

Part 2 – Stretch Database – How does it work?

Ingredients (aka requirements):

–         1 x On-premises database using SQL Server 2016, or later

–         1 x Valid Azure subscription

At the table level, there are a number of limitations, therefore finding a stretchable table is not as easy as it seems. You would need tables without constraints nor rare data types, that are not involved in relations with other tables and that don’t use some special SQL Server features.

Click here for the full list of limitations.

Below is a high level overview of the steps involved. For a more detailed walkthrough, see the demo in Part 5.  If you don’t already have a subscription, sign up for a free 30 day trial, it’s a great way to familiarise yourself with the tool.

Step 1: Identify a table that can be stretched

Data Migration Assistant can be used to scan your database and advise which tables are technically suitable to be stretched (but at the end of the day, it is still up to you to decide if it suits the business needs).

Step 2: Stretch the database

Enabling Stretch to a database consists of:

–         Creating a DB Master Key (which will be held on prem the whole time for obvious security reasons)

–         Creating a DB scoped credential

–         Provisioning a new Azure SQL DB or connect to an existing one

–         Altering the Database to REMOTE_DATA_ARCHIVE = ON

Step 3: Stretch the table

Alter the table to REMOTE_DATA_ARCHIVE = ON and we have the ability to set a predicate that will serve as the criteria to elect which rows will be moved to Azure.

If a predicate is not defined, then the whole table is stretched.

Step 4: Monitor the data migration

Now is time to monitor the process as the data trickles through from your on-premises database to the cloud. We can use a GUI or TSQL to achieve this.

GUI: Right-Click the DB / Tasks / Stretch / Monitor

TSQL: SELECT * FROM sys.dm_db_rda_migration_status WHERE migrated_rows >0;

Part 3 – Stretch Database – Is this for me?

As potential candidates for stretching, you should consider large data sets of data that are never modified and rarely accessed.

Historical, auditing, logging and large transactional tables are the usual candidates to be stretched.

Stretch DB is relatively simple to configure and implement on your environment. It may be a viable and cost-effective option to off-load cold data remotely to minimize costs with maintenance, storage, etc.

Your decision to implement Stretch DB will likely be based on Azure pricing compared to the cost of upgrading your storage capacity and how often the cold data needs to be accessed since performance will be slower due to latency accessing the remote data.

Stretch database is a great introduction to Azure if your organisation has been rehearsing that first step into the cloud for ages. It has very little impact, it is easy to implement, it can be easily and quickly rolled back and will allow you to get familiarised with the Azure tenancy options and features available.

Part 4 – The Gotcha!

There is one big, undocumented and surprisingly uncommented Gotcha.

“Now that I’ve read this post I know that only the hot data will be hosted locally. Right?” Right!

But what happens if I lose internet connectivity?

A) Nothing happens, I can still query both Hot and Cold data

B) I can only query the Hot local data

C) My whole table becomes unavailable and inaccessible

Answer = C. I’ve learnt the hard way that if you lose access to Azure your whole table becomes unavailable, including the hot and locally hosted portion of your data.

But there is a way to get round this, you will need to change the context of the queries for that table by running the following system stored procedure.

Exec sp_rda_set_query_mode @mode = LOCAL_ONLY

Exec sp_rda_set_query_mode @mode = LOCAL_AND_REMOTE – Revert it once internet is back

So are you saying that my on-prem application will stop working if my on-prem database has a stretched table and I lose internet connectivity?

Yes!

So make sure that internet redundancy is in place and take this constraint in consideration, document it well and make all relevant people aware before moving any further.

Run the demo and check it for yourself.

-------------------------------------------------------------------------------

-- 1. Enable Remote Data Archive at Instance level

-------------------------------------------------------------------------------

exec sp_configure 'remote data archive', 1

------------------------------------------------------------------------------

-- 2. Create the 'StretchDatabase', the 'Employee' table and insert some values

-------------------------------------------------------------------------------

CREATE DATABASE StretchDB

GO

USE StretchDB

GO

CREATE TABLE Employee (

Empid INT NOT NULL,

EmpName VARCHAR(255) NULL,

IsEmployed BIT NOT NULL)

GO

INSERT INTO Employee VALUES (1,'Paul',1)

INSERT INTO Employee VALUES (2,'Carl',1)

INSERT INTO Employee VALUES (3,'Gareth',0)

INSERT INTO employee VALUES(4,'Finn',0)

GO

--Have a look at your table

SELECT Empid, EmpName, IsEmployed

FROM Employee

GO

-------------------------------------------------------------------------------

-- 3. Next: Stretch the Database

-- This can be done using Microsoft SQL Server Management Studio:

-- - Connect to your [StretchDatabase] database

-- - Right click on the database and select the Stretch/Enable

-- menu option to launch the 'Enable Database for Stretch' wizard

-- - Follow the guided instructions in the wizard to enable and configure

-- Employee for stretch to Azure.

--

-- - OR....lets see how it works under the hood and use T-SQL

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

-- 3a. Create a database master key

-------------------------------------------------------------------------------

CREATE MASTER KEY ENCRYPTION BY PASSWORD='StrongPassword1';

GO

-------------------------------------------------------------------------------

-- 3b. Create a database scoped credential

-------------------------------------------------------------------------------

CREATE DATABASE SCOPED CREDENTIAL Stretch_DB_Credential

WITH IDENTITY = 'Anderson' , SECRET = 'StrongPassword1' ;

GO

-------------------------------------------------------------------------------

-- 3c. Enable Stretch to the database

-------------------------------------------------------------------------------

ALTER DATABASE StretchDB

SET REMOTE_DATA_ARCHIVE = ON

(

SERVER = 'yourserver.database.windows.net', -- Change to your server details

CREDENTIAL = Stretch_DB_Credential

) ;

GO

-------------------------------------------------------------------------------

-- 4. Stretch the Employee table using T-SQL or via GUI

-------------------------------------------------------------------------------

USE StretchDB

GO

CREATE FUNCTION dbo.fn_stretchpredicate(@column1 int)

RETURNS TABLE

WITH SCHEMABINDING

AS

RETURN SELECT 1 AS is_eligible

WHERE @column1 = 0

GO

ALTER TABLE [dbo].[Employee] SET ( REMOTE_DATA_ARCHIVE = ON (

FILTER_PREDICATE = dbo.fn_stretchpredicate([IsEmployed]),

MIGRATION_STATE = OUTBOUND

) )

GO

-------------------------------------------------------------------------------

-- 4. Once stretch configuration is complete and data migration has begun

-- you can monitor the progress of data migration via the wizard or by

-- executing a query from sys.dm_db_rda_migration.

-------------------------------------------------------------------------------

SELECT * FROM sys.dm_db_rda_migration_status WHERE migrated_rows >0;

GO

-------------------------------------------------------------------------------

-- 5. You can view local and remote data storage by running sp_spaceused

-- with the 'LOCAL_ONLY', 'REMOTE_ONLY' & 'ALL' parameters as follows

-------------------------------------------------------------------------------

EXEC sp_spaceused 'Employee', 'true', 'ALL';

EXEC sp_spaceused 'Employee', 'true', 'LOCAL_ONLY';

EXEC sp_spaceused 'Employee', 'true', 'REMOTE_ONLY';

GO

-------------------------------------------------------------------------------

-- 6. While migration is in progress or complete, you can continue to insert

-- new records into Employee as before

-------------------------------------------------------------------------------

Insert into Employee values (5,'Anderson',1)

Insert into Employee values (6,'Felix',0)

Insert into Employee Values (7,'Sam',0)

insert into employee values(8,'John',0)

GO

-------------------------------------------------------------------------------

-- 7. Check the Execution Plan for differences in the plan cost

-------------------------------------------------------------------------------

--Using Remote Query

SELECT Empid, EmpName, IsEmployed

FROM Employee

GO

--Querying only local data

SELECT Empid, EmpName, IsEmployed

FROM Employee

Where IsEmployed =1

GO

-------------------------------------------------------------------------------

-- 8. Update local and remote data, note how DML operations are not allowed for the remote portion of your data

-------------------------------------------------------------------------------

--Update local data, EmpId 1 has IsEmployed = 1

UPDATE employee

SET EmpName = 'Fred'

WHERE empid=1

--Update remote data, EmpId 6 has IsEmployed = 0 | What happens????

UPDATE employee

SET EmpName = 'Michael'

WHERE empid=6

-------------------------------------------------------------------------------

-- 9. Delete local and remote data, note how DML operations are not allowed for the remote portion of your data

-------------------------------------------------------------------------------

--Update local data, EmpId 1 has IsEmployed = 1

DELETE employee

WHERE empid=1

--delete remote data, EmpId 6 has IsEmployed = 0 | What happens????

DELETE employee

WHERE empid=6

-------------------------------------------------------------------------------

-- 10. Identify Stretched DBs and Tables

-------------------------------------------------------------------------------

--Local DB name(s)

SELECT name AS 'Local DB'

FROM sys.databases

WHERE is_remote_data_archive_enabled = 1

GO

--Remote DB name(s)

SELECT remote_database_name AS 'Remote DB'

FROM sys.remote_data_archive_databases

GO

--Local Table name(s)

SELECT name AS 'Local Table'

FROM sys.tables

WHERE is_remote_data_archive_enabled = 1

GO

--Remote Table name(s)

select remote_table_name as 'Remote Table', filter_predicate, migration_direction_desc from sys.remote_data_archive_tables

-------------------------------------------------------------------------------

-- 11. Identify Credential used for Stretching DBs and Tables

-------------------------------------------------------------------------------

select * from sys.database_scoped_credentials

------------------------------------------------------------------------------

-- 12. Create Index to demonstrate data maintenance streamlining

-------------------------------------------------------------------------------

CREATE NONCLUSTERED INDEX [IX_ALL_Employee] ON [dbo].[Employee]

( [Empid], [EmpName], [IsEmployed] )

GO

CREATE NONCLUSTERED INDEX [IX_Active_Employee] ON [dbo].[Employee]

( [Empid], [EmpName], [IsEmployed] )

WHERE [IsEmployed] = 1

GO

CREATE NONCLUSTERED INDEX [IX_Inactive_Employee] ON [dbo].[Employee]

( [Empid], [EmpName], [IsEmployed] )

WHERE [IsEmployed] = 0

GO

--show the number of rows in th table

select 'All' as 'Status', count (*) from Employee

select 'Active - Local' as 'Status', count (*) from Employee where IsEmployed=1

select 'Inactive - Remote' as 'Status', count (*) from Employee where IsEmployed=0

GO

--Show the number of rows in each index | Note how the All_Employee only contain the number of local rows..cool!!

SELECT

i.[name] AS IndexName,

t.[name] AS TableName,

SUM(s.[used_page_count]) * 8 AS IndexSizeKB,

s.row_count

FROM sys.dm_db_partition_stats AS s

INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id]

AND s.[index_id] = i.[index_id]

INNER JOIN sys.tables t ON t.OBJECT_ID = i.object_id

where i.name is not null

GROUP BY i.[name], t.[name], s.row_count

ORDER BY i.[name], t.[name]

-------------------------------------------------------------------------------

-- 13. Lets see what happens when you lose Internet Connectivity, so pull the cable

-------------------------------------------------------------------------------

use StretchDB

select * from Employee

/*

-- Should get the following message:

OLE DB provider "SQLNCLI11" for linked server "(null)" returned message "Login timeout expired".

OLE DB provider "SQLNCLI11" for linked server "(null)" returned message "A network-related or instance-specific error has occurred while establishing a connection to SQL Server.

Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.".

Msg 53, Level 16, State 1, Line 239

Named Pipes Provider: Could not open a connection to SQL Server [53].

*/

-- Now try querying just the local data

select * from Employee where IsEmployed=1

-- 13.a Change the scope of the queries

Exec sp_rda_set_query_mode @mode = LOCAL_ONLY

select * from Employee where IsEmployed=1

select * from Employee

-- 13.b Restore internet connectivity and test again

select * from Employee where IsEmployed=1

select * from Employee

-- 13.c Restore the query scope

Exec sp_rda_set_query_mode @mode = LOCAL_AND_REMOTE

select * from Employee

Skill up with our training courses

Are you looking to skill up on the latest version of Microsoft SQL Server or get started with Power BI? We’ve got you covered. We offer the most comprehensive range of Microsoft Data Platform and Data Analytics training courses available.

Browse courses