Blog index > Archives > Posts Tagged ‘RAID’
avatar

NetApp ADP – Overview

Tuesday, October 4th, 2016

Since the “GA” release of Clustered Data ONTAP (CDoT) 8.3 and subsequent releases you are able to utilise NetApp’s Advanced Drive Partitioning (ADP).

Today ADP has two main implementations: Root-data partitioning and Flash Pool SSD Partitioning.

Root-Data Partitioning

Before ADP was released in 8.3 the smaller systems had an issue with excessive drive usage purely to get the system running. Best practice says that each node requires a 3 drive root aggregate (that’s 6 out of the 12 drives gone), plus 2 host spares (now you’ve lost 8) leaving 4 drives. So your options were extremely limited. If you take the total number of drives and divide by the number of data drives this gives a storage efficiency of ~17% (this does not sit well with customers).

NetApp recognise that this was a real sticky point for customers and as such they introduced ADP for the lower end systems and All Flash FAS arrays to regain the competitive edge to these units.

Without ADP you have an active-passive configuration:

Being Active-Passive means that only one node of the pair is actively accessing the data aggregate and should that controller fail then its partner will then take over and continue operations. This for most customers was not ideal.

In the above example we lose a total of (8) drives to parity and at least (1) drive per node is used as a hot spare.  This leaves only (2) drives to store actual data which nets 1.47TB of usable capacity. Now with ADP, instead of having to dedicate disks for the nodes root aggregate we can now logically partition the drive into two separate segments, a smaller segment to be used for the root aggregate and larger segment for data.

adp_high_lvl

With ADP you now have either active-passive or active-active:

By using ADP you now have significantly more useable capacity for data and using an active-passive configuration where you now use all remaining space for data again if you take the total number of drives and divide by the number of data drives this gives a storage efficiency of ~77% (this is more palatable for customers).

adp_active_passive_low

You can also create two data partitions and run with active-active, however you lose more capacity for the parity of two data partitions compared with just one. So you will go from ~77% to about ~62%.

adding-drives-adp

Constraints to consider before leveraging Root-Partitions 

  • HDD types that are not available as internal drives: ATA, FCAL and MSATA
  • 100GB SSDs cannot be leveraged to create root partitions
  • MetroCluster and ONTAP-v do not support root partitions
  • Aggregates composed or partitioned drives must leverage Raid-DP
  • Supported only on entry (2240, 2550, 2552, 2554) and All Flash FAS systems (systems with only SSDs attached).
  • Removing or failing one drive will cause raid rebuild and slight performance degradation for both the node’s root aggregate and underlying data aggregate.
  • Aggregates composed of partitioned drives must have a RAID type of RAID-DP.

Flash Pool SSD Partitioning

Previously Flash Pools were created by dedicating a sub-set of SSDs in either a Raid-4 or Raid-DP raid group to an existing spinning disk aggregate.  In clusters that may have multiple aggregates this traditional approach is very wasteful. For Example, if a 2-node cluster had (4) data aggregates, (3) on one node and (1) on the other the system we would require a minimum a total of (10) drives to allow for only (1) caching drive per data aggregate.  If these SSDs are 400GB SSDs then each aggregate would not only 330GB (right sized actual) of cache out of the 4TB total RAW.

With CDoT 8.3 a new concept “Storage Pools” were introduced, which increase cache-allocation agility by granting the ability to provision cache based on capacity rather than number of drives. Storage Pools allows the administrator to create one or more logical “pools” of SSD that is then divided into (4) equally divided slices (allocation units) that can then be applied to existing data aggregates across either node in the HA pair.

storage-pool

Using the same example as previously described, creating one large Storage Pool with (10) 400GB SSD drives we would net a total of (4) allocation units each with 799.27GB of usable cache capacity that can then be applied to our four separate aggregates.

By default when a storage pool is created (2) allocation units are assigned to each node in the HA pair.  These allocation units can be re-assigned to which ever node/aggregate needed them as long as they sit within the same HA pair. Thereby making better use of your SSD storage pool, compared to the previous method where those SSD would be allocated to one aggregate and there they would remain.

 

peter-green

 

Pete Green
Fast Lane Lead NetApp Expert

No Comments
avatar

Triple parity Raid – WHY?

Wednesday, September 21st, 2016

Ever since the dawn of disk drives we have acknowledged and accepted that magnetic media is not infinitely stable and reliable and as such we have come up with many ways to protect the data we store on disk.

We have the basics of Raid 1 where we just mirror the data to another drive. This allows a drive to fail and we still have all the data, however this type of solution is expensive as we double the amount of storage required.

So moving forward we came to Raid 4, and 5. Both give a parity bit to enable the rebuild of any failed drive to a hot spare drive without the cost of replicating everything. This was later enhanced with Raid 6 and RaidDP, both of which allowed for two concurrent drive failures without the loss of access and those failed drives could be successfully rebuilt.

So why did we go from one parity to two. Well this is mainly due to the fact that we are using less space to store more and as such the drive mechanics are far more precise with very little tolerance compared to older style drives where the data was written to larger areas of disk and generically had higher tolerance.

We are now breaching new realms with the higher capacity drives they have some underlying quirks to them. Firstly, being larger capacity means typically that they take longer to rebuild when you have a failure, and this extended rebuild times extends your vulnerability to consecutive failures.

So we need better protection, especially when you consider that we are currently seeing a higher read error rate with SSD drives deployed throughout data centres.dsc_0124

So although today the thought process may hold you back from this additional level of protection, it will become more common practice to add another level of protection especially with the likes of the 15TB SSD drives. With rebuild times being around 12 hours for 15TB and with drive capacities continuing to grow it won’t be long before larger drives are available and in turn the rebuild for them will probably also extend in relation to the additional capacity.

Should you encounter a read error on a disk that is part of a Raid configuration then you are most probably never going to notice it as this error will be hidden or masked by having the parity checksum in a dual parity configuration, but to sustain operations when there are two read errors can only be done with a third parity bit. With that in mind especially with higher capacity and extended rebuild times it makes perfect sense to evolve and utilise the triple parity.

NetApp® have with OnTap 9 release RAID-TEC™ which serves two purposes. Firstly it gives the added parity protection in Raid with a Third parity bit, but also the reconstruction method has been redeveloped so that during a drive rebuild there is no performance degradation, this means you keep your performance and have a higher level of protection giving a higher level of confidence.

The new RAID-TEC™ (Triple Erasure Encoding) gives more usable space and better protection of large drive sizes. Another aspect of this that for any existing RAID-DP aggregates they will be able to convert to RAID-TEC non-disruptively, which is a major plus point in comparison to the older Raid4 and RAID-DP the drive counts are more conducive to storage efficiency.

 

peter-green

Pete Green
Fast Lane Lead NetApp Expert

No Comments
avatar

Just how Paranoid are you about your data?

Thursday, September 15th, 2016

When we deal with storage we do it on levels or paranoia

So how paranoid are you ?

Let’s start with the basics……..

We know that hard drives fail and therefore we use RAID to protect against that, but what else can I do to satisfy my varying levels of Paranoia

So protecting the drive is sorted but what if the shelf my disks reside in fails….what then, Well with Data ONTAP you have the option of using SyncMirror which will give Shelf level protection by replicating drives on one shelf to another shelf, so yes it requires double the storage but it will allow continuous access even if you have a shelf failure.

Ok so we have the drives covered, what about the controller (I want to eliminate any single point of failure). So our levels of Paranoia have risen another level. Don’t panic NetApp’s Primary offering is the ClusterMode solution. This allows multiple controllers to work together as a very robust, resilient beast that is designed to facilitate any failure and continue to run. Each node (controller) has a partner and they work together to accomplish the same goal and that High Availability pair with other pairs to give a solution that just keeps going.

That’s all well and good but what if my data centre has a significant failure that takes out my entire cluster..? well I see your level of paranoia has risen another level, but again have no fear you can replicate from one site to another asynchronously using SnapMirror and thereby giving you the option of a complete Disaster Recovery solution where in the event of a failure you can bring your remote site up relatively quickly with minimal data loss.

Well that sounds great but….(here we go, another level of paranoia) I need near zero down time and although disaster recovery is great my business needs full business continuance. NetApp have a solution for that using MetroCluster. MetroCluster is designed for just that, to keep you up and running no matter what and with full Synchronous replication up to 200km I think we have you covered. However, should you need more then you can combine MetroCluster with SnapMirror to extend your solution.

MetroCluster can:

  • Protect against hardware, network or site failure via transparent switchover.
  • Eliminate planned and unplanned downtime and change management.
  • Upgrade hardware and software without disrupting operations.
  • Deploy without complex scripting, application or OS dependencies.
  • Achieve continuous availability for VMware, Microsoft, Oracle, SAP and more.

MetroCluster software:

  • Protects against the major causes of downtime: power, cooling and network faults, as well as disasters affecting your site.
  • Combines array-based clustering with synchronous mirroring to deliver continuous availability and zero data loss.
  • Provides transparent recovery from failures.
  • Reduces administrative overhead and the risk of human error.
  • Needs only a single command for switchover.

So how Paranoid are you about your data………..it doesn’t matter, NetApp has a solution to satisfy every level of paranoia.

Courses you might be interested in after reading this article:

For more courses please visit: www.flane.co.uk/netapp

peter-green
Pete Green
Fast Lane Lead NetApp Expert

No Comments