01 October, 2012

Shredded Storage in SharePoint 2013 Preview

As you probably know, documents stored in a library or as attachments are stored as binary large objects (BLOBs) in the content database, by default. Remote BLOB Storage (RBS) is a set of APIs that let you move BLOBs out of the SQL Server content database to another storage mechanism. 

In SharePoint 2010, there was opportunity for improvement with both the storage utilization story and I/O performance for documents. In SharePoint 2010, if version history is enabled on a document library, each new version results in a new BLOB for that document. Conceptually, a 1MB file with 10 versions is consuming 10MB of storage. 

What a lot of people don’t consider is that a “new version” doesn’t mean just a change to the document-- it can mean a change to metadata. So if a user changes a metadata field, that is a new version, and a copy of the BLOB is created, even if no change was made to the document itself! 

So BLOBs can proliferate quickly and, to put it bluntly, “pointlessly.” By the way, it’s a best practice to set version retention limits on any library where version history is enabled. 

Second, I/O performance is problematic in SharePoint 2010. There’s an unnecessary file read that occurs when changes—at least to Office documents—are uploaded to the SharePoint web server. 

At the highest level, what SharePoint 2013 shredded storage does is “chunk” or “page” the BLOB into numerous smaller shreds. So a single BLOB is now a construct made up of numerous shreds. 

One result of this architecture is an effect similar to deduplication or single instancing: only differences are saved, not entire BLOBs. So, for example, if you have versioning enabled and a user makes a change to a document, only changed shreds are added to the storage footprint of that document. Shreds that have not changed from the previous version are simply “associated” with both versions. 

You can see significant improvements in storage utilization. That same 1MB file with 10 versions may be consuming 2.2MB of storage, for example. 

Shredded storage also reduces the amount of information about a file that has to be retrieved by the web server from the content database, so I/O improves.
With that conceptual introduction in place, let me punch out a couple of things you need to know, which I’ve found misrepresented in the community:
  • Shredded storage is, on the whole, a good thing, and is on by default
  • You can disable (or re-enable) shredded storage on a per-web application basis. 
  • BLOBs are not shredded on an upgrade, but are shredded when uploaded or modified. 
  • Shredded storage is SharePoint 2013, running on SQL Server 2008 R2 or SQL Server 2012
  • Shredded storage is different than Cobalt. Cobalt is a framework that allows Office client applications to efficiently synchronize changes to SharePoint using the File Synchronization via SOAP of HTTP (FSSHTTP) API. Shredded storage is about how a document is shredded, stored, and reassembled by SQL Server. I’m hearing lots of people suggest that shredded storage works only on Office documents. Not true. Such statements are confounding shredded storage with Cobalt. When we look inside a content database on SharePoint 2013, we see PDFs and other file formats being shredded as well. 
  • Shredded storage is independent of RBS. You can use RBS with or without shredded storage, and vice versa. Now whether you would use RBS with shredded storage is another question. Folks in the community are currently running tests to determine the performance implications of doing so. My guess is that many of the performance advantages of RBS that I saw with customers, and that both Microsoft and I have documented in white papers, will be reduced or eliminated due to shredded storage. There was also a benefit in SharePoint 2010 to running RBS to store BLOBs on SAN and NAS devices that support deduplication. Shredded storage might very well reduce or eliminate that benefit. However, RBS will continue to be critically important in hierarchical storage management, where you are managing tiers of storage (and therefore cost and other characteristics) based on business rules.
So that’s the “net net” of shredded storage. 

What is still not well documented are the inner workings, so I’ll strive in a future article to detail how it works. Even some of the “official” training and documentation I’ve seen has holes, particularly in relation to the interaction of Cobalt and shredded storage, and how shredded storage does (or does not) help I/O for non-office document formats. 

The good news is you don’t really have to understand how it works, just that it does work. The bottom line is that the SharePoint web server is doing more work, and SQL Server is doing more work, to reduce I/O bottlenecks. Because I/O is likely to be the number-one bottleneck in SharePoint performance, this is all quite desirable. And, along the way, the storage footprint of a document can be reduced—perhaps significantly. 

Each release of SharePoint offers a “feature” that is terribly named (does “shredded storage” sound like a good thing?), poorly documented, and misrepresented in the community. This is one of them for SharePoint 2013. 

Curtsey: www.sharepointpromag.com


Your feedback is always appreciated. I will try to reply to your queries as soon as possible- Amol Ghuge

Note: Only a member of this blog may post a comment.