By Sam Saffron

2011-06-02 07:34:23 8 Comments

Daily we ship our SQL Server backups across the WAN. We need to minimize the size of these backups so it does not take forever.

We don't mind if our backup process takes a bit longer; as it stands we need to move 30gigs of compressed backup across the WAN that takes over 10 hours.

There are 2 options we have to get smaller daily backups.

  1. Log shipping, which would mean we would have to restructure DR process.
  2. Strip information out of the db and rebuild on the other side (drop non clustered indexes, pack clustered indexes at 100% - rebuild on the other side)

Both would involve a fair amount of work from our part. We are using SQL Server 2008 pro, all backups are compressed.

Are there any commercial products that can give us similar backup size to option (2)?

Is there a comprehensive script out there that will allow us to accomplish (2)? (handling indexed views, filtered indexes, foreign keys and so on)


@Brent Ozar 2011-06-08 12:36:06

Question 1: Is there a commercial backup product that will give a similar backup size to stripping non-essential data like indexes out of the database?

No. There's a lot of backup compression products out there (Quest LiteSpeed, Red Gate SQL Backup, Idera SQLSafe, Hyperbac, etc) but all of them function by just compressing the output of SQL Server's regular backup process. Some of them do it in tricky ways - HyperBac and LiteSpeed's Engine option are file system filter drivers, meaning they're intercepting the output on its way to disk - but the end result of all of these products is just compressed backup output.

Question 2. Is there a comprehensive script out there to dump all this extra data?

Over time, as you keep more history in the database (4, 5, 8, 10 years) you won't want to rip out all the index data and rebuild it on the other side of the WAN. Instead, you want to just transfer the modified data, and that's where log shipping comes in.

You shouldn't do this.

But if you really, really wanna do this (and no, I won't help you), you can do it with filegroup backups. Set up your database filegroups like this:

  • Primary filegroup (required, but leave it empty)
  • ClusteredIndex filegroup (put your clustered indexes here)
  • ExtraneousCrap Filegroup (put everything else here)

Start doing compressed filegroup backups of just the first two, and copy those smaller ones to your DR server. You can use SQL Server 2008's filegroup backup and restore capability to just restore the Primary and ClusteredIndex filegroups, and then they'll immediately be available for querying. They're not really going to be workable until you get that ExtraneousCrap filegroup online, but there's a nasty trick for that too - in the MVP Deep Dives book, there's a chapter on editing the system tables in order to make the ExtraneousCrap filegroup and all of the associated indexes disappear. This trick is dangerous, totally unsupported, and a hell of a bad idea - but hey, you asked for it.

@gbn 2011-06-02 08:47:53

First thought based on comments...

Use differential backups every, say, 6 hours, to reduce the size/time of backup + FTP. Then reduce your full backup + FTP to weekends only. This avoids complexity of log shipping, simple to do, and only adds slight complexity to DR

I feel that differential backups are overlooked... I've suggested using them before:

Edit: after jcolebrand's comment I'll attempt to explain more

A differential backup only takes pages that have changed. Outside of any index maintenance (which can affect a lot of the database), only a few % of pages will change during a day. So a differential backup is a lot smaller than a full backup before any compression.

If you have a full backup, say weekly, you can then do daily differentials and ship them off site. A daily full backup with differentials will still require both files off site.

This should solve the problem of getting data from A to B, C and D quickly.

You probably need to restore both the full and latest differential to get the latest data but you can maybe work around this with NORECOVERY and a STANDBY file (I haven't tried it with a diff restore for years since I was last in a pure DBA job).

An added bonus is that diff backups are unrelated to ongoing log backups so you can separate any High Availability/DR requirement from the "get data to the code monkeys" requirement.

I see some issues if you have daily full backups by policy or audit, but the diff restore can be applied before any log restores to shorten recovery time. Unlike backups, diff and log restores do interact.

Hope I've covered most bases...

@Marian 2011-06-07 20:52:14

Hyperbac is a very smart compression tool, that allows one to compress backups and leave all maintenance plans and jobs unchanged, because it handles files at OS level. If they don't want to change anything, but just add a new tool to the box, they should definitely give it a shot. I know I've used it and loved it for SQL 2005. But for more compression they should still do some manual labor...

@jcolebrand 2011-06-07 21:15:41

@Marian I'm ... pretty sure Brent O is just a consultant at need.

@gbn 2011-06-08 04:46:44

@Marian: there is a limit to compression and more compression = more CPU/time. The smallest backup will be the one with the least input = a differential, regardless of compression tool/format. Link about time/ratio One: you can gave extreme compression but it takes longer and for a compressed 30 GB file it could take longer than the FTP...

@Marian 2011-06-08 06:56:26

I agree with you on that, the thing is that commercial tools have better compression rates than the MS one and they are configurable (by no of CPUs allocated to the operation), they offer encryption..and other features. I don't necessarily praise them (they're not very cheap), I just said that some of them can be used in conjunction with current backups of SQL Server (full, diff, log) without changing the environment, which guys seem to need/want. @jcolebrand: got it, thank you!

@SqlSandwiches 2011-06-07 04:24:03

I would use SQL transactional replication. Your initial load would take some time but once you got up and running you could only send over what information you want. For example, if you only have 3 or 4 tables that get updated, you can only send those 3 or 4 tables.

You can also choose what you want to ship over. FK's, clustered/non-clustered indexes, table partition schemes, stored procs, and TONS more.

If this isn't an option, you could use REDGATE SQL BACKUP - I used this before and got compression levels up to 90%. A lot smaller than SQL's.

@SQLChicken 2011-06-06 02:50:01

If you have the money for it, and your architecture allows for it, check into something like Riverbed technologies ( An appliance like this in conjunction with a replication or log shipping scenario might be your best bet.

If not then a few questions. If you only have to do a refresh every few months, why the concern over bandwidth? The only time you'd have to worry about the transfer is once, getting the full backup over there to do a restore locally, or am I mistaken in that being your setup?

Another possibility is instead of worrying about getting all that data to them, setup a Citrix environment and have them remote into you. With Citrix you have minimal bandwidth requirements between client/host and you have the ability to do what you need locally and not worry about having to replicate those changes elsewhere. Just my $0.02

@jcolebrand 2011-06-07 14:22:44

Can you expound on this anymore? I know that this is for the StackExchange team proper, so I'm sure they would love a more indepth walkthrough ;)

@SQLChicken 2011-06-28 12:54:35

Haha there's lots to consider here. Which point exactly would you like me to expound on?

@jcolebrand 2011-06-28 16:05:37

The replication/log shipping was what I had in mind, but that was like two weeks ago, so I doubt it is as important now. Also, I just re-read and saw the part about the Citrix, and I could've told you then (as now) that they don't do that. They just do local development using a DVCS infrastructure and just want the data for testing/playing with/confirmation. Also perhaps for the data dumps.

@SQLChicken 2011-06-28 20:31:46

Gotcha. Then as others have said already, the 3rd party vendors like Redgate and Quest have very good backup compression tools to help you meet their needs. Another potential solution is SQL Azure. Right now the database size limit is 50GB but they've lifted the charges for any data being loaded so it might be a cost effective solution.

@johndacostaa 2011-06-03 14:32:29

I recommend switching to something like log shipping. Essentially if you have a choice of sending 30 Gigs over 24 hours vs sending at end of day within a shorter time window, the network speed will be less of an issue for you.

Your devs on the slow network will also be able to download more conveniently sized files, via FTP or whatever process you have in place. They could also setup jobs that download throughout the day.

In addition to sql server compression, you could implement a 3rd party tool such which has higher compression like litespeed or redgate sqlbackup.

Furthermore on the network side you could install network devices which can optimize your throughput to the DR site. In the past I successfully used Riverbed Appliance to successfully get 90GB backup from FL to VA in less than 3 hours.

Another option would be to backup specific file groups, excluding the indexes, etc, but you are still stuck with clustered indexes and depending on your db structure you may get more cost/hassle than benefit from that approach.


@Marian 2011-06-02 11:45:36

There are commercial products that can help you compress your backups better than the native 2008 compression. Examples are RedGate Backup, Hyperbac, Idera SQL Backup, Litespeed Backup.

They come with the added cost of high CPU and file types that will need to be handled with tools outside MS shipped ones. This with the exception of Hyperbac (now acquired by Redgate) compression, which handles files transparently and allows one to create zip compatible files (and also doesn't need any third party tools).

But there is no tool that will offer you a file of the size that you would obtain by doing manual cleanup. Please look over Brent Ozar's article: How to really compress your SQL Server backups, he will advise doing the same steps you have at point no. 2.

@Hogan 2011-06-07 15:42:22

RedGate FTW!!!!

@Marian 2011-06-07 18:13:17

@Hogan: if you can't beat them, buy them. It's a very good example :-). Anyway, both products that are now part of Redgate and handle database compression can coexist successfully.

Related Questions

Sponsored Content

1 Answered Questions

2 Answered Questions

[SOLVED] Strategy for backup for databases with little use (almost archived databases)

2 Answered Questions

[SOLVED] SQL Server hangs during backup on EC2

1 Answered Questions

1 Answered Questions

[SOLVED] Index Maintenance Plan

1 Answered Questions

[SOLVED] What to do when handed a server the first time?

  • 2016-04-13 22:17:31
  • Keith Rivera
  • 1065 View
  • 6 Score
  • 1 Answer
  • Tags:   sql-server

2 Answered Questions

[SOLVED] Reindexing Clustered Primary Key

2 Answered Questions

[SOLVED] Backup and Restore of Development

  • 2014-05-02 19:29:05
  • jlimited
  • 679 View
  • 2 Score
  • 2 Answer
  • Tags:   sql-server

Sponsored Content