Blog index > Archives > NetApp in the Fast Lane
avatar

Deduplication just went ballistic with Aggregate Inline Deduplication

Wednesday, September 27th, 2017

So with the release of ONTAP 9.2 came a long desired feature of being to deduplicate data at the aggregate level and not just volume level.

So why is this so key, well lets look at the historical view of how dedupe worked and its pitfalls. As a feature dedupe has always been a great feature but its biggest downfall was that if I sent an attachment to many colleagues there is a strong possibility that they will all want to save it for future use or reference. Now if your lucky they may all save the same file into the same volume in which case you win as dedupe will effectively reduce all instances to just one. Unfortunately we know that with IT we are never that lucky and as a result I may have some instances in a volume but more instances in other volumes in which case on a per volume basis I get the benefit but I still loose out. With ONTAP 9.2 this has been addressed so that you can now dedupe at the aggregate level, so you may have multiple copies of the file located in many volumes, if those volumes are located in the same aggregate then aggregate dedupe can take all instances and reduce them to a single instance. Now you really do win with Dedupe. From a simplified technical concept….If you’re not familiar with deduplication, it’s a feature that allows data blocks that are identical to rely on pointers to a single block instead of having multiple copies of the same blocks.

This is all currently done inline (as data is ingested) only, and currently  on All Flash FAS systems by default. The space savings really show with workloads such as ESXi datastores, where you may be applying OS patches across multiple VMs in multiple datastores hosted in multiple FlexVol volumes but all within an aggregate. Aggregate inline deduplication brings an average additional ~1.32:1 ratio of space savings for VMware workloads. Who doesn’t want to save some space?

Should you require any information on training from Fast Lane, please contact us on:NetApp

Phone: 0845 470 1000

enquiries@flane.co.uk

 

No Comments
avatar

NetApp FlexGroup: Thirst Quenching

Thursday, October 13th, 2016

The thirst for Data and how we quench it?

I heard that NetApp was creating a new distributed file system that could evolve how NAS works, out of curiosity I started to look into it. I was curious before with Infinite Volumes and did we really need them. But this evolutionary step is exciting.

Now that ONTAP 9.1 is available, I thought it was about time I looked into FlexGroups in more detail.

There’s an excellent official Technical Report, TR-4557 – NetApp FlexGroup Technical Overview.

Data is growing.

When I started in the storage industry a 4GB disk was absolutely massive and the storage systems I worked on at the time could utilise 45 x 4GB drives, but that was just ridiculous, who on earth would need that much storage. The concept of 1TB of storage was just a wish list dream concept. So it’s no great secret to see that over the years our thirst for storage has grown. 1TB was a pipe dream, then 100TB became the pipe dream and then 1PB became a pipe dream, and well now we are pipe dreaming of Exabyte’s.

So file systems are getting bigger along with the size of the datasets. To give an example, my first digital camera …. A single photo would take up less than 1MB of space now a reasonable camera a single photo can 10MB or more. So our evolving data is getting thirsty for data space. So NetApp have looked at this closely and taken a visionary approach and designed FlexGroups to address this now and the future.

NetApp FlexGroup.

FlexGroup has been designed to solve multiple issues for large-scale NAS workloads.

  1. Capacity – Scales up to 20 petabytes
  2. High file counts – Up to 400 billion files
  3. Performance – parallelized operations in NAS workloads, across CPUs, nodes, aggregates and constituent FlexVols
  4. Simplicity of deployment – Simple to use GUI in System Manager; avoid having to use junction paths to get larger than 100TB capacity
  5. Load balancing – Use all of your storage resources for a dataset
  6. Resiliency – Fix metadata errors in real time without taking downtime

So if you have a heavy NAS workload and you want to be able to through all your available resources at it then quench your thirst with a FlexGroup.

How does a FlexGroup work?

FlexGroup essentially takes the concept of a FlexVol and simply enhances it by joining multiple FlexVol member constituents into a single namespace that acts like a single FlexVol to clients and storage administrators.

flexgroup

To a NAS client, it would look like this:

nas

Files are written to individual FlexVol constituents across the FlexGroup. Files are not striped. The amount of concurrency you would see in a FlexGroup would depend on the number of constituents you used. Right now, the maximum number of constituents for a FlexGroup is 200. Since the max volume size is 100TB and the max file count for each volume is 2 billion, that’s where we get our “20PB, 400 billion files” number. Keep in mind that those limits are simply the tested limits – theoretically, the limits are able to go much higher.

So unlike a standalone volume, with FlexGroup ONTAP will which of the constituent members will be the best location to store that write, this is working to keep the FlexGroup balanced without a performance penalty.

So how do I win?

When NAS operations can be allocated across multiple FlexVols, we don’t run into the issue of serialization in the system. Instead, we start spreading the workload across multiple file systems (FlexVols) joined together (the FlexGroup). And unlike Infinite Volumes, there is no concept of a single FlexVol to handle metadata operations – every member volume in a FlexGroup is eligible to process metadata operations.

That way, a client can access a persistent mount point that shows gobs of available space without having to traverse different file systems like you’d have to do with FlexVols.

It’s been tribal knowledge for a while now to create multiple FlexVols in large NAS environments to parallelize operations, but we still had the issue of 100TB limits and the notion of file systems changing when you traversed volumes that were junctioned to other volumes. Plus, storage administrators would be looking at a ton of work trying to figure out how best to layout the data to get the best performance results.

Now, with NetApp FlexGroup, all of that architecture is done for you without needing to spend weeks architecting the layout.

So how is it faster?

In preliminary testing of a FlexGroup against a single FlexVol, we’ve seen up to 6x the performance. And that was with simple spinning SAS disk.

Adding more nodes and members can improve performance. Adding AFF into the mix can help latency.

Snapshots

In the first release of NetApp FlexGroup, we’ll have access to snapshot functionality. Essentially, this works the same as regular snapshots in ONTAP – it’s done at the FlexVol level and will capture a point in time of the filesystem and lock blocks into place with pointers. Because a FlexGroup is a collection of member FlexVols, we want to be sure snapshots are captured at the exact same time for filesystem consistency. As such, FlexGroup snapshots are coordinated by ONTAP to be taken at the same time.

How do you get NetApp FlexGroup?

NetApp FlexGroup is currently available in ONTAP 9.1RC1 for general availability.

  • NFSv3 and SMB 2.0/2.1 (RC2 for SMB support)
  • Snapshots
  • SnapMirror
  • Thin Provisioning
  • User and group quota reporting
  • Storage efficiencies (inline deduplication, compression, compaction; post-process deduplication)
  • OnCommand Performance Manager and System Manager support
  • All-flash FAS (incidentally, the *only* all-flash array that currently supports this scale)
  • Sharing SVMs with FlexVols
  • Constituent volume moves
  • 20 PB, 400 billion files

How can a FlexGroup be enhanced?

While FlexGroup as a feature is awesome on its own, there are also a number of ONTAP 9 features added that make a FlexGroup even more attractive.

The benefit that can be added with a FlexGroup right out of the box include:

  • 15 TB SSDs
  • Per-aggregate CPs
  • RAID-TEC – triple parity to add extra protection to your large data sets

So is that it ? not by a long shot, there are lots of rumours about other enhancements coming so keep your eyes open and get ready to quench your data thirst.

peter-green

Pete Green
Fast Lane Lead NetApp Expert

No Comments
avatar

NetApp ADP – Overview

Tuesday, October 4th, 2016

Since the “GA” release of Clustered Data ONTAP (CDoT) 8.3 and subsequent releases you are able to utilise NetApp’s Advanced Drive Partitioning (ADP).

Today ADP has two main implementations: Root-data partitioning and Flash Pool SSD Partitioning.

Root-Data Partitioning

Before ADP was released in 8.3 the smaller systems had an issue with excessive drive usage purely to get the system running. Best practice says that each node requires a 3 drive root aggregate (that’s 6 out of the 12 drives gone), plus 2 host spares (now you’ve lost 8) leaving 4 drives. So your options were extremely limited. If you take the total number of drives and divide by the number of data drives this gives a storage efficiency of ~17% (this does not sit well with customers).

NetApp recognise that this was a real sticky point for customers and as such they introduced ADP for the lower end systems and All Flash FAS arrays to regain the competitive edge to these units.

Without ADP you have an active-passive configuration:

Being Active-Passive means that only one node of the pair is actively accessing the data aggregate and should that controller fail then its partner will then take over and continue operations. This for most customers was not ideal.

In the above example we lose a total of (8) drives to parity and at least (1) drive per node is used as a hot spare.  This leaves only (2) drives to store actual data which nets 1.47TB of usable capacity. Now with ADP, instead of having to dedicate disks for the nodes root aggregate we can now logically partition the drive into two separate segments, a smaller segment to be used for the root aggregate and larger segment for data.

adp_high_lvl

With ADP you now have either active-passive or active-active:

By using ADP you now have significantly more useable capacity for data and using an active-passive configuration where you now use all remaining space for data again if you take the total number of drives and divide by the number of data drives this gives a storage efficiency of ~77% (this is more palatable for customers).

adp_active_passive_low

You can also create two data partitions and run with active-active, however you lose more capacity for the parity of two data partitions compared with just one. So you will go from ~77% to about ~62%.

adding-drives-adp

Constraints to consider before leveraging Root-Partitions 

  • HDD types that are not available as internal drives: ATA, FCAL and MSATA
  • 100GB SSDs cannot be leveraged to create root partitions
  • MetroCluster and ONTAP-v do not support root partitions
  • Aggregates composed or partitioned drives must leverage Raid-DP
  • Supported only on entry (2240, 2550, 2552, 2554) and All Flash FAS systems (systems with only SSDs attached).
  • Removing or failing one drive will cause raid rebuild and slight performance degradation for both the node’s root aggregate and underlying data aggregate.
  • Aggregates composed of partitioned drives must have a RAID type of RAID-DP.

Flash Pool SSD Partitioning

Previously Flash Pools were created by dedicating a sub-set of SSDs in either a Raid-4 or Raid-DP raid group to an existing spinning disk aggregate.  In clusters that may have multiple aggregates this traditional approach is very wasteful. For Example, if a 2-node cluster had (4) data aggregates, (3) on one node and (1) on the other the system we would require a minimum a total of (10) drives to allow for only (1) caching drive per data aggregate.  If these SSDs are 400GB SSDs then each aggregate would not only 330GB (right sized actual) of cache out of the 4TB total RAW.

With CDoT 8.3 a new concept “Storage Pools” were introduced, which increase cache-allocation agility by granting the ability to provision cache based on capacity rather than number of drives. Storage Pools allows the administrator to create one or more logical “pools” of SSD that is then divided into (4) equally divided slices (allocation units) that can then be applied to existing data aggregates across either node in the HA pair.

storage-pool

Using the same example as previously described, creating one large Storage Pool with (10) 400GB SSD drives we would net a total of (4) allocation units each with 799.27GB of usable cache capacity that can then be applied to our four separate aggregates.

By default when a storage pool is created (2) allocation units are assigned to each node in the HA pair.  These allocation units can be re-assigned to which ever node/aggregate needed them as long as they sit within the same HA pair. Thereby making better use of your SSD storage pool, compared to the previous method where those SSD would be allocated to one aggregate and there they would remain.

 

peter-green

 

Pete Green
Fast Lane Lead NetApp Expert

No Comments
avatar

Securing your data beyond the physical realms using Storage Encryption

Tuesday, September 20th, 2016

For most organisations having Raid protected storage is a given but what if your need complete piece of mind that your data is protected and “SECURE”

For some organisations Storage Encryption is not considered, and this can be for several reasons. It could simply be that there is a perception that the Encryption process will have an unwanted overhead, which may be deemed counterintuitive. It could be that there is just a lack of understanding as to what storage Encryption is and what it can give you.

So let’s take a look at what NetApp® offers.dsc_0122

NetApp® Storage Encryption (NSE) provides full-disk encryption and what’s more they do it without compromising storage efficiency or performance using self-encrypting drives supplied from some of the leading drive producers. NSE has the beauty of being a non-disruptive process that gives a comprehensive, cost effective, hardware-based level of security that has a very simplistic approach in its operation and usage. Although it is a simple solution to use it does not detract from its compliance with government and industry regulations. There is also no compromise on storage efficiency.

NetApp® uses full disk encryption (FDE) capable disks.  Data is not encrypted external to the disk drive itself – this is truly data at rest only.  Once in the controller or on the network, data is not encrypted. What makes this so good is that the encryption engine is built into the disk so all encryption takes place at close to line speed and therefore it does not give a performance penalty so whether your system uses encryption or not the performance will be the same. It is fair to say that encryption disks cost more then it will be a price point as to whether the cost is justified.

FDE disks have a requirement for a key to be generated and pushed down to the disk to enable encryption of data. FDE capable disks are available in a varying sizes from 600GB -1.2TB performance, 800GB and up SSDs, but there is the added advantage that if someone steals one drive or a complete set of drives without the key it is impossible to read the data.

So what exactly does NSE offer:

NSE supports the entire suite of storage efficiency technologies from NetApp. This includes array-based AntiVirus scanning, Deduplication, inline and post process compression. It also supports the SafeNet KeySecure encryption-key appliance, which strengthens and simplifies long-term key management. NSE complies with the OASIS KMIP standard and helps you comply with FISMA, HIPAA, PCI, Basel II, SB 1386 and E.U. Data protection Directive 95/46/EC regulations using FIPS 140-2 validated hardware

 

peter-green
Pete Green
Fast Lane Lead NetApp Expert

No Comments