Friday, April 18, 2008

ZFS Difficulties

We've been evaluating ZFS as a replacement for VxVM and VxFS in some of our production clusters. We encountered some difficulties.

ZFS has supported our development environment for about a year now, and we have enjoyed the flexibility and feature set of ZFS--especially the snapshot management and the ease of volume management. The performance, however, has left something to be desired. We had hoped to demonstrate that we could get adequate performance on ZFS by being more aggressive with tuning.

During the initial testing, we did not use ZFS mirroring or RAIDZ, though we did separate the Oracle log and temp files into separate pools as suggested in the ZFS Best Practices Guide, and we did tweak several tuning parameters as suggested in the ZFS Evil Tuning Guide.

We were able to get performance that met our requirements, but we experienced a much more serious problem when our test system blew a CPU. One of our pools became corrupted, which resulted in a panicked server. This was unexpected; we had isolated our test environment on a non-global zone and used dataset to delegate the zpool administration to the zone. We had expected a corrupted zpool to cause problems that were isolated to the zone, not to cause problems to the entire server.

This server was part of a nascent Sun Cluster configuration that we were also testing; the failover server attempted to import the zpool and promptly panicked. Investigation with zdb -lv revealed that the metadata had become corrupted as a result of the processor failure and resulting panic. We opened a call with Sun, and I posted message with the details to the ZFS Discuss list.

The upshot is that Solaris 10u5 and earlier do not have a way to prevent a panic on zpool corruption. Nevada allows you to specify how the OS will react to zpool corruption, but this will not be brought forward to Solaris 10 until update 6.

We could have reduced the likelihood of zpool metadata corruption by having more than one zdev in each pool (allowing the metadata replicas to have copies on different zdevs). Sun tells us that this would have narrowed the window in which we could have corrupted the metadata, but that the problem itself will not be fixed until update 6 later this year.

Until we get this problem resolved, I don't see being able to use ZFS in a critical production environment. If you have any light to shine on any of these issues, please post a comment here or on the ZFS Discuss list.

--Scott

2 comments:

Diego said...

"We opened a call with Sun, and I posted message with the details to the ZFS Discuss list." => The link under 'message' returns 404.

There are also some links no longer valid on the right side of the blog. Since Oracle took over Sun... well, you know what they have done already >:¬|

Thanks for the good stuff. Appreciate your work very much.

ScottCromar said...

Thanks for pointing that out. Since the post is a few years old, I'm going to leave the links in place. I believe that the issue in question is resolved in current versions of ZFS.

I took the chance to clean up the reference links and re-point them to the current Oracle locations.