Friday, April 05, 2013

IPC Issues

Most of the InterProcess Communication parameters are reported by sysdef -i . Other parameters can be checked on a running system using adb -k :
adb -k /dev/ksyms /dev/mem
parameter-name/D
^D
(to exit)

In Solaris 10, control of the shared memory, semaphore and message queue parameters have been shifted to project-based resource controls. (See the project, prctl and getrctl man pages for detailed information.)

Many of the maximum parameter values discussed below represent 32-bit limits on integer size for Solaris 2.6 and 2.5.1. In Solaris 7+, these limits have been lifted somewhat. In theory, the maximums for Solaris 7+ would be in the 16 EB (exabytes) range rather than 2 GB (for Solaris 2.6 and 2.5.1). In practice, implementation details limit the range to something like 16 TB (terabytes). Due to the memory used by the kernel to set up space for the structures governed by these parameters, it is important to think about the useage of the resource before tuning it. In most cases, the 32-bit limits provide more than adequate head room for growth.

Shared memory, semaphores and message queues are only enabled if the appropriate kernel modules are loaded. These are automatically loaded if certain IPC functions are called, but they can also be forced to load via /etc/system forceload commands or root modload commands.

Each of these three facilities runs on top of the /kernel/misc/ipc module. Shared memory connects to the ipc module via /kernel/sys/shmsys, semaphores connect via /kernel/sys/semsys and message queues connect via /kernel/sys/msgsys.

For Solaris 2.5.1-9, the module names will need to be included when setting these parameters in the /etc/system file. For example:
set shmsys:parameter=value

Solaris 10 sets the parameters for these facilities via the project interface.

Other IPC mechanisms exist (such as named pipes), but they are not tuneable in the sense of this discussion.

Each IPC resource has at least these attributes: key (identifies this instance of the resource), creator (UID/GID of the creating process), owner (UID/GID of the resource owner), and permissions (similar to filesystem read/write/execute owner/group/other permissions).

Each object is created by calling the appropriate *get function ( shmget / semget / msgget ) with the desired key. If no objects of that type with that key exist, it is created and a resource ID is passed back to the caller.

Once created, the IPC objects can be controlled with the appropriate *ctl function ( shmctl / semctl / msgctl ).

The ipcs command presents information on IPC services that are currently loaded. It presents a "facility not in system" message if a given module has not been loaded yet.

Shared Memory

Shared memory provides the fastest way for processes to pass large amounts of data to one another. As the name implies, shared memory refers to physical pages of memory that are shared by more than one process.

Of particular interest is the "Intimate Shared Memory" facility, where the translation tables are shared as well as the memory. This enhances the effectiveness of the TLB (Translation Lookaside Buffer), which is a CPU-based cache of translation table information. Since the same information is used for several processes, available buffer space can be used much more efficiently. In addition, ISM-designated memory cannot be paged out, which can be used to keep frequently-used data and binaries in memory.

Database applications are the heaviest users of shared memory. Vendor recommendations should be consulted when tuning the shared memory parameters.

Solaris 10 only uses the shmmax and shmmni parameters. (Other parameters are set dynamically within the Solaris 10 IPC model.)

  • shmmax (max-shm-memory in Solaris 10+): This is the maximum size of a shared memory segment (ie the largest value that can be used by shmget). Its theoretical maximum value is 4294967295 (4GB), but practical considerations usually limit it to less than this. There is no reason not to tune this value as high as possible, since no kernel resources are allocated based on this parameter. Solaris 10 sets shmmax to 1/4 physical memory by default, vs 512k for previous versions.
  • shmmin: This is the smallest possible shared memory segment size. The default is 1 byte; this parameter should probably not be tuned.
  • shmmni (max-shm-ids in Solaris 10+): Maximum number of shared memory identifiers at any given time. This parameter is used by kernel memory allocation to determine how much size to put aside for shmid_ds structures. Each of these is 112 bytes and requires an additional 8 bytes for a mutex lock; if it is set too high, memory useage can be a problem. The maximum setting for this variable in Solaris 2.5.1 and 2.6 is 2147483648 (2GB), and the default is 100. For Solaris 10, the default is 128 and the maximum is MAXINT.
  • shmseg: Maximum number of segments per process. It is usually set to shmmni, but it should always be less than 65535. Sun documentations suggests a maximum for this parameter of 32767 and a default of 8 for Solaris 2.5.1 and 2.6.

Semaphores

Semaphores are a shareable resource that take on a non-negative integer value. They are manipulted by the P (wait) and V (signal) functions, which decrement and increment the semaphore, respectively. When a process needs a resource, a "wait" is issued and the semaphore is decremented. When the semaphore contains a value of zero, the resources are not available and the calling process spins or blocks (as appropriate) until resources are available. When a process releases a resource controlled by a semaphore, it increments the semaphore and the waiting processes are notified.

Solaris 10 only uses the semmni, semmsl and semopm parameters. (Other parameters are dynamic within the Solaris 10 IPC model.)

  • semmap: This sets the number of entries in the semaphore map. This should never be greater than semmni. If the number of semaphores per semaphore set used by the application is "n" then set
    semmap = ((semmni + n - 1)/n)+1
    or more. Alternatively, we can set semmap to semmni x semmsl. An undersized semmap leads to "WARNING: rmfree map overflow" errors. The default setting is 10; the maximum for Solaris 2.6 is 2GB. The default for Solaris 9 was 25; Solaris 10 increased the default to 512. The limit is SHRT_MAX.
  • semmni (max-sem-ids in Solaris 10+): Maximum number of systemwide semaphore sets. Each control structure consumes 84 bytes. For Solaris 2.5.1-9, the default setting is 10; for Solaris 10, the default setting is 128. The maximum is 65535
  • semmns: Maximum number of semaphores in the system. Each structure uses 16 bytes. This parameter should be set to semmni x semmsl. The default is 60; the maximum is 2GB.
  • semmnu: Maximum number of undo structures in the system. This should be set to semmni so that each control structure has an undo structure. The default is 30, the maximum is 2 GB.
  • semmsl (max-sem-nsems in Solaris 10+): Maximum number of semaphores per semaphore set. The default is 25, the maximum is 65535.
  • semopm (max-sem-ops in Solaris 10+): Maximum number of semaphore operations that can be performed in each semop call. The default in Solaris 2.5.1-9 is 10, the maximum is 2 GB. Solaris 10 increased the default to 512.
  • semume: Maximum number of undo structures per process. This should be set to semopm times the number of processes that will be using semaphores at any one time. The default is 10; the maximum is 2 GB.
  • semusz: Number of bytes required for semume undo structures. This should not be tuned; it is set to semume x (1 + sizeof(undo)). The default is 96; the maximum is 2 GB.
  • semvmx: Maximum value of a semaphore. This should never exceed 32767 (default value) unless SEM_UNDO is never used. The default is 32767; the maximum is 65535.
  • semaem: Maximum adjust-on-exit value. This should almost always be left alone. The default is 16384; the maximum is 32767.

Message Queues

Unix uses message queues for asynchronous message passing between processes. Each message has a type field, which can be used for priority messaging or directing a message to a chosen recipient.

Message queues are implemented as FIFO (first-in first-out) mechanisms. They consist of a header pointing to a linked list.

Solaris 2.5.1 and before used very coarse-grained mutex locking for message queues, which resulted in uneccessary contention as compared to 2.6 and later versions.

Solaris 10 only uses the msgmni, msgmnb and msgtql parameters. (Other parameters are dynamic within the Solaris 10 IPC model.)

  • msgmap: Number of entries in the msg map. The default is 100, the maximum is 2 GB.
  • msgmax: Maximum size of a message. The default is 2048; the maximum is 2 GB. Shared memory should be considered for moving large messages between processes; it is much more efficient for large data transfers.
  • msgmnb (max-msg-qbytes in Solaris 10+): Maximum number of bytes for the message queue. Te default is 4096; the maximum is 2 GB. The default in Solaris 10 was increased to 65536 and the maximum increased to ULONG_MAX.
  • msgmni (max-msg-ids in Solaris 10+): Number of unique message queue identifiers. The default is 50; the maximum is 2 GB. The default in Solaris 10 has been increased to 128. This should be set to 10% above the sum of the recommendations for applications on the system. Kernel resources are allocated based upon this parameter, so it should not be sized arbitrarily large.
  • msgssz: Message segment size. The default is 8; the maximum is 2 GB.
  • msgtql (max-msg-messages in Solaris 10+): Number of message headers. The default is 40; the maximum is 2 GB. Solaris 10 increased the default to 8192 and the maximum to UINT_MAX
  • msgseg: Number of message segments. The default is 1024; the maximum is 32 KB.

Solaris 10+ IPC Resource Management

The Solaris 10 IPC resource management framework was designed to overcome several shortcomings of the older SVR4-based system. Several parameters were converted to be dynamically resized, the defaults were increased, the names were changed to be more human-readable, the resource limits were system-wide (permitting potential conflicts) and reboots were required for even minor changes.

The Solaris 10 system allows changes to be associated with a project and monitored via prctl.

Additional information about Solaris 10+ resource management can be found on the Resource Management web page or in Sun's System Administration Guide: Solaris Containers-Resource Management and Solaris Zones on the Sun Documentation Web Site.

For the purposes of IPC resource management, the following are the important parameters:

  • project.max-shm-ids: Maximum shared memory IDs for a project. Replaces shmmni
  • project.max-sem-ids: Maximum semaphore IDs for a project. Replaces semmni
  • project.max-msg-ids: Maximum message queue IDs for a project. Replaces msgmni
  • project.max-shm-memory: Total amount of shared memory allowed for a project. Replaces shmmax
  • process.max-sem-nsems: Maximum number of semaphores allowed per semaphore set. Replaces semmsl
  • process.max-sem-ops: Maximum number of semaphore operations allowed per semop. Replaces semopm
  • process.max-msg-qbytes: Maximum number of bytes of messages on a message queue. Replaces msgmnb
  • process.max-msg-messages: Maximum number of messages on a message queue. Replaces msgtql

No comments: