Solaris Troubleshooting: Solaris Processes

The process is one of the fundamental abstactions of Unix. Every object in Unix is represented as either a file or a process. (With the introduction of the /proc structure, there has been an effort to represent even processes as files.)

Processes are usually created with fork or a less resource intensive alternative such as fork1 or vfork. fork duplicates the entire process context, while fork1 only duplicates the context of the calling thread. This can be useful (for example), when exec will be called shortly.

Solaris, like other Unix systems, provides two modes of operation: user mode, and kernel (or system) mode. Kernel mode is a more privileged mode of operation. Processes can be executed in either mode, but user processes usually operate in user mode.

Per-process Virtual Memory

Each process has its own virtual memory space. References to real memory are provided through a process-specific set of address translation maps. The computer's Memory Management Unit (MMU) contains a set of registers that point to the current process's address translation maps. When the current process changes, the MMU must load the translation maps for the new process. This is called a context switch.

The MMU is only addressable in kernel mode, for obvious security reasons.

The kernel text and data structures are mapped in a portion of each process's virtual memory space. This area is called the kernel space (or system space).

In addition, each process contains these two important kernel-owned areas in virtual memory: u area and kernel stack. The u area contains information about the process such as information about open files, identification information and process registers. The kernel stack is provided on a per-process basis to allow the kernel to be re-entrant. (ie, several processes can be involved in the kernel, and may even be executing the same routine concurrently.) Each process's kernel stack keeps track of its function call sequence when executing in the kernel.

The kernel can access the memory maps for non-current processes by using temporary maps.

The kernel can operate in either process context or system (or interrupt) context. In process context, the kernel has access to the process's memory space (including u area and kernel stack). It can also block the current process while waiting for a resource. In kernel context, the kernel cannot access the address space, u area or kernel stack. Kernel context is used for handling certain system-wide issues such as device interrupt handling or process priority computation.

Additional information is available on the Process Virtual Memory page.

Process Context

Each process's context contains information about the process, including the following:

Hardware context:
- Program counter: address of the next instruction.
- Stack pointer: address of the last element on the stack.
- Processor status word: information about system state, with bits devoted to things like execution modes, interrupt priority levels, overflow bits, carry bits, etc.
- Memory management registers: Mapping of the address translation tables of the process.
- Floating point unit registers.
User address space: program text, data, user stack, shared memory regions, etc.
Control information: u area, proc structure, kernel stack, address translation maps.
Credentials: user and group IDs (real and effective).
Environment variables: strings of the form variable= value.

During a context switch, the hardware context registers are stored in the Process Control Block in the u area.

The u area includes the following:

Process control block.
Pointer to the proc structure.
Real/effective UID/GID.
Information regarding current system call.
Signal handlers.
Memory management information (text, data, stack sizes).
Table of open file descriptors.
Pointers to the current directory vnode and the controlling terminal vnode.
CPU useage statistics.
Resource limitations (disk quotas, etc)

The proc structure includes the following:

Identification: process ID and session ID
Kernel address map location.
Current process state.
Pointers linking the process to a scheduler queue or sleep queue.
Pointers linking this process to lists of active, free or zombie processes.
Pointers keeping this structure in a hash queue based on PID.
Sleep channel (if the process is blocked).
Scheduling priority.
Signal handling information.
Memory management information.
Flags.
Information on the relationship of this process and other processes.

Kernel Services

The Solaris kernel may be seen as a bundle of kernel threads. It uses synchronization primitives to prevent priority inversion. These include mutexes, semaphores, condition variables and read/write locks.

The kernel provides service to processes in the following four ways:

System Calls: The kernel executes requests submitted by processes via system calls. The system call interface invokes a special trap instruction.
Hardware Exceptions: The kernel notifies a process that attempts several illegal activities such as dividing by zero or overflowing the user stack.
Hardware Interrupts: Devices use interrupts to notify the kernel of status changes (such as I/O completions).
Resource Management: The kernel manages resources via special processes such as the pagedaemon.

In addition, some system services (such as NFS service) are contained within the kernel in order to reduce overhead from context switching.

Threads

An application's parallelism is the degree of parallel execution acheived. In the real world, this is limited by the number of processors available in the hardware configuration. Concurrency is the maximum acheivable parallelism in a theoretical machine that has an unlimited number of processors. Threads are frequently used to increase an application's concurrency.

A thread represents a relatively independent set of instructions within a program. A thread is a control point within a process. It shares global resources within the context of the process (address space, open files, user credentials, quotas, etc). Threads also have private resources (program counter, stack, register context, etc).

The main benefit of threads (as compared to multiple processes) is that the context switches are much cheaper than those required to change current processes. Sun reports that a fork() takes 30 times as long as an unbound thread creation and 5 times as long as a bound thread creation.

Even within a single-processor environment, multiple threads are advantageous because one thread may be able to progress even though another thread is blocked while waiting for a resource.

Interprocess communication also takes considerably less time for threads than for processes, since global data can be shared instantly.

Kernel Threads

A kernel thread is the entity that is scheduled by the kernel. If no lightweight process is attached, it is also known as a system thread. It uses kernel text and global data, but has its own kernel stack, as well as a data structure to hold scheduling and syncronization information.

Kernel threads store the following in their data structure:

Copy of the kernel registers.
Priority and scheduling information.
Pointers to put the thread on the scheduler or wait queue.
Pointer to the stack.
Pointers to associated LWP and proc structures.
Pointers to maintain queues of threads in a process and threads in the system.
Information about the associated LWP (as appropriate).

Kernel threads can be independently scheduled on CPUs. Context switching between kernel threads is very fast because memory mappings do not have to be flushed.

Lightweight Processes

A lightweight process can be considered as the swappable portion of a kernel thread.

Another way to look at a lightweight process is to think of them as "virtual CPUs" which perform the processing for applications. Application threads are attached to available lightweight processes, which are attached to a a kernel thread, which is scheduled on the system's CPU dispatch queue.

LWPs can make system calls and can block while waiting for resources. All LWPs in a process share a common address space. IPC (interprocess communication) facilities exist for coordinating access to shared resources.

LWPs contain the following information in their data structure:

Saved values of user-level registers (if the LWP is not active)
System call arguments, results, error codes.
Signal handling information.
Data for resource useage and profiling.
Virtual time alarms.
User time/CPU usage.
Pointer to the associated kernel thread.
Pointer to the associated proc structure.

By default, one LWP is assigned to each process; additional LWPs are created if all the process's LWPs are sleeping and there are additional user threads that libthread can schedule. The programmer can specify that threads are bound to LWPs.

Lightweight process information for a process can be examined with ps -elcL.

User Threads

User threads are scheduled on their LWPs via a scheduler in libthread. This scheduler does implement priorities, but does not implement time slicing. If time slicing is desired, it must be programmed in.

Locking issues must also be carefully considered by the programmer in order to prevent several threads from blocking on a single resource.

User threads are also responsible for handling of SIGSEGV (segmentation violation) signals, since the kernel does not keep track of user thread stacks.

Each thread has the following characteristics:

Has its own stack.
Shares the process address space.
Executes independently (and perhaps concurrently with other threads).
Completely invisible from outside the process.
Cannot be controlled from the command line.
No system protection between threads in a process; the programmer is responsible for interactions.
Can share information between threads without IPC overhead.

Priorities

Higher numbered priorities are given precedence. The scheduling page contains additional information on how priorities are set.

Zombie Processes

When a process dies, it becomes a zombie process. Normally, the parent performs a wait() and cleans up the PID. Sometimes, the parent receives too many SIGCHLD signals at once, but can only handle one at a time. It is possible to resend the signal on behalf of the child via kill -18 PPID. Killing the parent or rebooting will also clean up zombies. The correct answer is to fix the buggy parent code that failed to perform the wait() properly.

Aside from their inherent sloppiness, the only problem with zombies is that they take up a place in the process table.

Kernel Tunables

The following kernel tunables are important when looking at processes:

maxusers: By default, this is set to 2 less than the number of Mb of physical memory, up to 1024. It can be set up to 2048 manually in the /etc/system file.
max_nprocs: Maximum number of processes that can be active simultaneously on the system. The default for this is (16 x maxusers) + 10. The minimum setting for this is 138, the maximum is 30,000.
maxuprc: The default setting for this is max_nprocs - 5. The minimum is 133, the maximum is . This is the numberof processes a single non-root user can create.
ndquot: This is the number of disk quota structures. The default for this is (maxusers x 10) + max_nprocs. The minimum is 213.
pt_cnt: Sets the number of System V ptys.
npty: Sets the number of BSD ptys. (Should be set to pt_cnt.)
sad_cnt: Sets the number of STREAMS addressable devices. (Should be set to 2 x pt_cnt.)
nautopush: Sets the number of STREAMS autopush entries. (Should be set to pt_cnt.)
ncsize: Sets DNLC size.
ufs_ninode: Sets inode cache size.

proc Commands

The proc tools are useful for tracing attributes of processes. These utilities include:

pflags: Prints the tracing flags, pending and held signals and other /proc status information for each LWP.
pcred: Prints credentials (ie, EUID/EGID, RUID/EGID, saved UID/GIDs).
pmap: Prints process address space map.
pldd: Lists dynamic libraries linked to the process.
psig: Lists signal actions.
pstack: Prints a stack trace for each LWP in the process.
pfiles: Reports fstat, fcntl information for all open files.
pwdx: Prints each process's working directory.
pstop: Stops process.
prun: Starts stopped process.
pwait: Wait for specified processes to terminate.
ptree: Prints process tree for process.
ptime: Times the command using microstate accounting; does not time children.

These commands can be run against a specific process, but most of them can also be run against all processes on the system. See the above- referenced man page for details.

Solaris Troubleshooting

Friday, May 03, 2013

Solaris Processes