Thursday, May 02, 2013

Basic NFS Troubleshooting

NFS Troubleshooting

Sun's web pages contain substantial information about NFS services; search for an NFS Administration Guide or NFS Server Performance and Tuning Guide for the version of Solaris you are running. The share_nfs man page contains specific information about export options.

If NFS is not working at all, try the following:

  • Make sure that the NFS server daemons are running. In particular, check for statd, lockd, nfsd and rarpd. If the daemons are not running, they can be started by running /etc/init.d/nfs.server start. See Daemons below for information on NFS-related daemons.
  • Check the /etc/dfs/dfstab and type shareall.
  • Use share or showmount -e to see which filesystems are currently exported, and to whom. showmount -a shows who the server believes is actually mounting which filesystems.
  • Make sure that your name service is translating the server and client hostnames correctly on both ends. Check the server logs to see if there are messages regarding failed or rejected mount attempts; check to make sure that the hostnames are correct in these messages.
  • Make sure that the /etc/net/*/hosts files on both ends report the correct hostnames. Reboot if these have to be edited.

If you are dealing with a performance issue, check

  • Network Issues
  • CPU Useage
  • Memory Levels
  • Disk I/O
  • Increase the number of nfsd threads in /etc/init.d/nfs.server if the problem is that requests are waiting for a turn. Note that this does increase memory useage by the kernel, so make sure that there is enough RAM in the server to handle the additional load.
  • Where possible, mount filesystem with the ro option to prevent additional, unnecessary attribute traffic.
  • If attribute caching does not make sense (for example, with a mail spool), mount the filesystem with the noac option. If nfsstat reports a high getattr level, actimeo may need to be increased (if the attributes do not change too often).
  • nfsstat reports on most NFS-related statistics. The nfsstat page includes information on tuning suggestions for different types of problems that can be revealed with nfsstat.

If these steps do not resolve the issue, structural changes may be required:

  • cachefs can be used to push some of the load from the NFS server onto the NFS clients. To be useful, cfsadmin should be used to increase maxfilesize for the cache to a value high enough to allow for the caching of commonly-used files. (The default value is 3 Mb.)

NFS Client

When a client makes a request to the NFS server, a file handle is returned. The file handle is a 32 byte structure which is interpreted by the NFS server. Commonly, the file handle includes a file system ID, inode number and the generation number of the inode. (The latter can be used to return a "stale file handle" error message if the inode has been freed and re-used between client file accesses.)

If a response is not received for a request, it is resent, but with an incremented xid (transmission ID). This can happen because of congestion on the network or the server, and can be observed with a snoop session between server and client.

The server handles retransmissions differently depending on whether the requests are idempotent (can be executed several times without ill effect) or nonidempotent (cannot be executed several times). Examples of these would include things like reads and getattrs versus writes, creates and removes. The system maintains a cache of nonidempotent requests so that appropriate replies can be returned.

Daemons

The following daemons play a critical role in NFS service:

  • biod: On the client end, handles asynchronous I/O for blocks of NFS files.
  • nfsd: Listens and responds to client NFS requests.
  • mountd: Handles mount requests.
  • lockd: Network lock manager.
  • statd: Network status manager.

No comments: