Western Digital Reimagines HDD – Flash Integration with OptiNAND

The last few years have seen plenty of new innovations come up in the hard-disk drive market. For quite some time, the HDD technology roadmap was shared industry-wide – vendors introduced new technologies at different points in time, but they were all similar in nature. As a recent example, HGST (now, Western Digital) was the first to market with helium-filled HDDs, but both Seagate and Toshiba followed up with similar drives within a few years.

Prior to 2017, there was consensus that heat-assisted magnetic recording (HAMR) would help drive the increase in storage density for HDDs after traditional perpendicular magnetic recording (PMR) ran out of steam. Western Digital sprang a surprise in Q4 2017 by announcing the decision to go with microwave-assisted magnetic recording (MAMR) for future HDDs. Seagate, in the meanwhile, has been all-in on HAMR and also launched 20TB HDDs based on the technology for enterprise customers (those HAMR drives are yet to hit retail, though). In the meanwhile, Western Digital was promising MAMR drives for 16TB+ HDDs, but eventually back-tracked in favor of energy-enhanced PMR (ePMR). Toshiba, on the other hand, introduced flux control-MAMR (FC-MAMR) in its MG09-series of enterprise 16TB and 18TB HDDs.

At the HDD Reimagine event today, Western Digital is introducing OptiNAND – a novel architecture involving the integration of an embedded iNAND UFS embedded flash drive (EFD) on the drive’s mainboard.

In conjunction, the company is also announcing that it has been sampling its first 20TB non-SMR drives based on OptiNAND-enabled ePMR to select customers, and that it would be adopting the OptiNAND platform moving forward for all 20TB+ HDDs. The company also sees a path to 50TB OptiNAND-enabled ePMR drives in the second half of the decade.

While the company did not quantify the amount of NAND in its OptiNAND drives, they are stressing the fact that it is not a hybrid drive (SSHD). Unlike SSHDs, the OptiNAND drives do not store any user data at all during normal operation. Instead, the NAND is being used to store metadata from HDD operation in order to improve capacity, performance, and reliability.

Capacity

Western Digital’s OptiNAND announcement also conveys the fact that their 20TB 9-platter HDDs will continue to use energy-enhanced PMR (ePMR). In addition to the use of a triple-stage actuator to enable more accurate positioning of the heads over the tracks, the OptiNAND aspect is being touted as the key to enabling 2.2TB capacity for each platter.

The increase in areal density is being achieved by cramming the tracks on the platter closer together (increased TPI), while also moving out some of the metadata (both factory-generated and mid-user operation) out from the platter to the NAND. In particular, Western Digital made a mention of the repeatable run out (RRO) recording of the head jitter / error position as the spindle revolves. This data (running into multiple gigabytes) is generated in the factory during manufacturing. It is typically stored in the disk, taking up space that could have potentially been used for user data. The OptiNAND architecture moves this to the NAND in the EFD.

One of the key challenges to packing tracks closer together is the concept of ‘adjacent track interference’ (ATI). This results in the need to periodically refresh data in the platter’s tracks as it could get corrupted by writes to adjacent tracks. Currently available HDDs triggered these refreshes on a track-by-track basis based on the recording of write operations at the track-level. One of the downsides to increasing areal density by increasing the TPI is the need to do more frequent refreshes. From refreshing once in 10000 write operations in early HDDs, the narrow tracks now need to be refreshed as frequently as once every 6 writes. Beyond a certain point, it doesn’t make sense to increase TPI any further because the increase in the frequency of ATI refreshes has an extreme impact on performance. In present-generation HDDs, these refreshes have been triggered at the track level by recording write operations at that hierarchy. The OptiNAND architecture allows the write operations to be recorded at the sector level. This means that the refresh operations are more spread out both temporally and spatially, allowing the tracks to be packed closer together without sacrificing performance. In turn, this increases the areal density.

Performance

Consumers can operate HDDs with the write cache in the device enabled or disabled. Irrespective of the cache enablement, the HDD has to buffer up the incoming data. In the disabled case, the amount of data that could be buffered up is dependent on the amount of data that can be safely flushed out to non-volatile storage in the case of an emergency power-off (EPO) situation. The presence of significant NAND capacity in the HDD means that the drive can use the rotational energy present in the platters to flush out more data in the DRAM into the NAND (Present-day HDDs dump out the DRAM data into serial flash – around a couple of MBs worth – in an EPO situation). The ability to buffer out more data in this case means that the performance of write-cache enabled case and write-cache disabled case approach each other in OptiNAND-enabled HDDs.

Western Digital also claims that the ‘write cache enabled’ case can benefit on the performance front. This is an indirect result of the reduced refresh rates (referencing the observations in the previous sub-section on how OptiNAND handles adjacent-track interference) that allows the HDD to spend more time in servicing user data requests. Again, there was no quantification of the improvement in IOPS for different access patterns over non-OptiNAND HDDs in Western Digital’s event.

Reliability

The aspects of OptiNAND used to enhance the performance of the drives in the write caching disabled state also contribute to enhancing their reliability under EPO conditions. By including faster non-volatile storage compared to serial flash, Western Digital claims that up to 50x more data can be flushed out compared to previous-generation HDDs.

Concluding Remarks

Western Digital claims that the vertical integration possible with the HDD technology from the WD / HGST side along with the flash technology from the SanDisk side is essential for the creation of a platform like OptiNAND.

There is bound to be a cost-premium associated with the drives due to the NAND integration. New recording technologies (like HAMR and MAMR) require significant investment into the design of the recording heads as well as platters, and need to be revamped every few generations. On the other hand, technologies like OptiNAND are independent of the underlying technology.

Without exact quantification of the increase in areal density enabled by OptiNAND, it is not possible to provide comparative comments on the Capacity aspect of Western Digital’s OptiNAND trifecta – except that the company is now able to introduce 20TB hard drives to the market with the same ePMR technology used in its 18TB drives (around 2.2TB/platter).

The Performance aspect should be easier to evaluate when OptiNAND drives hit retail. While the benefits for the ‘write caching disabled’ case (where the NAND can act as a safe cache in an EPO situation) are easy to verify (essentially acting the same as the ‘write caching enabled’ case), the pure ‘write caching enabled’ case should be much more interesting to analyze against competing drives of the same capacity.

Western Digital indicated that all of their 20TB+ HDDs moving forward will be OptiNAND-enabled. This will be across all market verticals – cloud deployment, enterprise drives (Gold), storage for surveillance recording (Purple line), and NAS (Red line). It must be noted that the company has a 20TB SMR drive already in the market that is not OptiNAND-enabled. The new HDD architecture with its flexible SoC and high-performance NAND integration can also be used to enable customer-specific enhancements in the future. The ability to use the NAND to dynamically remap sectors can increase areal density and improve performance much more in SMR drives. Based on this, we can expect OptiNAND-enabled SMR drives to gain significant capacity advantage over CMR drives in comparison to what is being seen in the market currently.

The HDD industry is not yet in dire need of CPR, but Western Digital’s usage of OptiNAND to address the Capacity, Performance, and Reliability trifecta is yet another unique aspect in the innovation-rich hard-disk drive market. Western Digital has both HDD and complete flash technology (from NAND fabrication to controller) in-house, while the other HDD vendors do not have that advantage. As such, it might take the other vendors some time to catch up on the advantages of using NAND for HDD metadata.