Solaris Troubleshooting

Monday, February 24, 2014

Upcoming Classes

I look forward to seeing some of you at an upcoming class.

My class, "Technology Manager's Survival Guide" is on the Friday afternoon training schedule at LOPSA-East on May 2 in New Brunswick, NJ.

And I'll be presenting a two-part class, "Leader's Survival Guide," in Jacksonville, FL on July 16 and 23.

Friday, January 17, 2014

Effective Solaris System Monitoring

In order to maintain a reliable IT environment, every enterprise needs to set up an effective monitoring regime.

A common mistake by new monitoring administrators is to alert on everything. This is an ineffective strategy for several reasons. For starters, it may result in higher telecom charges for passing large numbers of alerts. Passing tons of irrelevant alerts will impact team morale. And, no matter how dedicated your team is, you are guaranteed to reach a state where alerts will start being ignored because "they're all garbage anyway."

For example, it is common for non-technical managers to want to send alerts to the systems team when system CPU hits 100%. But, from a technical perspective, this is absurd:

You are paying for a certain system capacity. Some applications (especially ones with extensive calculations) will use the full capacity of the system. This is a GOOD thing, since it means the calculations will be done sooner.
What is it you are asking the alert recipient to do? Re-start the system? Kill the processes that are keeping the system busy? If there is nothing for a the systems staff to do in the immediate term, it should be reported in a summary report, not alerted.
If there is an indication (beyond a busy CPU) that there is a runaway process of some sort, the alert needs to go to the team that would make that determination and take necessary action.

In order to be effective, a monitoring strategy needs to be thought out. You may end up monitoring a lot of things just to establish baselines or to view growth over time. Some things you monitor will need to be checked out right away. It is important to know which is which.

Historical information should be logged and retained for examination on an as-needed basis. It is wise to set up automated regular reports (distributed via email or web) to keep an eye on historical system trends, but there is no reason to send alerts on this sort of information.

Availability information should be characterized and handled in an appropriate way, probably through a tiered system of notifications. Depending on the urgency, it may show up on a monitoring console, be rolled up in a daily summary report, or paged out to the on-call person. Some common types of information in this category include:

"Unusual" log messages. Defining what is "unusual" usually takes some time to tune whatever reporting system is being used. Some common tools include logwatch, swatch, and logcheck. Even though it takes time, your team will need to customize this list on their own systems.
Hardware faults. Depending on the hardware and software involved, the vendor will have provided monitoring hooks to allow you to identify when hardware is failing.
Availability failures. This includes things like ping monitoring or other types of connection monitoring that give a warning when a needed resource is no longer available.
Danger signs. Typically, this will include anything that your team has identified that indicates that the system is entering a danger zone. This may mean certain types of performance characteristics, or it may mean certain types of system behavior.

Alerting Strategy

Alerts can come in different shapes, depending on the requirements of the environment. It is very common for alerts to be configured to be sent to a paging queue, which may include escalations beyond a single on-call person.

(If possible, configure escalations into your alerting system, so that you are not dependent on a single person's cell phone for the availability of your entire enterprise. A typical escalation procedure would be for an unacknowledged alert to be sent up defined chain of escalation. For example, if the on-call person does not respond in 15 minutes, an alert may go to the entire group. If the alert is not acknowledged 15 minutes after that, the alert may go to the manager.)

In some environments, alerts are handled by a round-the-clock team that is sometimes called the Network Operations Center (NOC). The NOC will coordinate response to the issue, including an evaluation of the alert and any necessary escalations.

Before an alert is configured, the monitoring group should first make sure that the alert meets three important criteria. The alert should be:

Important. If the issue being reported does not have an immediate impact, it should be included in a summary report, not alerted. Prioritize monitoring, alerting, and response by the level of risk to the organization.
Urgent. If the issue does not need to have action taken right away, report it as part of a summary report.
Actionable. If no action can be taken by the person who receives the alert, it should have been defined to be sent to the right person. (Or perhaps the issue should be reported in a summary report rather than sent through the alerting system.)

Solaris Monitoring Suggestions

Here are some monitoring guidelines I've implemented in some places where I have worked. You don't have to alert on a ton of different things in order to have a robust monitoring solution. Just these few items may be enough:

Ping up/down monitoring. You really can't beat it for a quick reassurance that a given IP address is responding.
Uptime monitoring. What happens if the system rebooted in between monitoring intervals? If you make sure that the uptime command is reporting a time larger than the interval between monitoring sweeps, you can keep an eye on sudden, unexpected reboots.
Scan rate > 0 for 3 consecutive monitoring intervals. This is the best measure of memory exhaustion on a Solaris box.
Run queue > 2x the number of processors for 3 consecutive monitoring intervals. This is a good measure of CPU exhaustion.
Service time (avserv in Solaris, svctime in Linux) > 20 ms for disk devices with more than 100 (r+w)/s, including NFS disk devices. This measures of I/O channel exhaustion. 20 ms is a very long time, so you will also want to keep an eye on trends on regular summary reports of sar -d data.
System CPU utilization > user CPU utilization where idle < 40% for systems that are not serving NFS. This is a good indication of system thrashing behavior.

Monday, June 24, 2013

Solaris Volume Manager (DiskSuite)

Solaris Volume Manager (formerly known as DiskSuite) provides a way to mirror, stripe or RAID-5 local disks. New functionality is constantly being added to the base software. A full discussion is beyond the scope of this article, so we will focus on the most common cases, how to set them up, how to manage them and how to maintain them. Additional information is available in the Solaris Volume Manager Administration Guide.

State Database

Solaris Volume Manager uses a state database to store its configuration and state information. (State information refers to the condition of the devices.) Multiple replicas are required for redundancy. At least four should be created on at least two different physical disk devices. It is much better to have at least six replicas on at least three different physical disks, spread across multiple controller channels, if possible.

In the event that the state databases disagree, a majority of configured state databases determines which version of reality is correct. This is why it is important to configure multiple replicas. A minimum of three database replicas must be available in order to boot without human assistance, so it makes sense to create database replicas liberally. They don't take up much space, and there is very little overhead associated with their maintenance. On JBOD (Just a Bunch Of Disks) arrays, I recommend at least two replicas on each disk device.

State database replicas consume between 4 and 16 MB of space, and should ideally be placed on a partition specifically set aside for that purpose. In the event that state database information is lost, it is possible to lose the data stored on the managed disks, so the database replicas should be spread over as much of the disk infrastructure as possible.

State database locations are recorded in /etc/opt/SUNWmd/mddb.cf. Depending on their condition, repair may or may not be possible. Metadevices (the objects which Solaris Volume Manager manipulates) may be placed on a partition with a state database if the state database is there first. The initial state databases can be created by specifying the slices on which they will live as follows:
metadb -a -f -c 2 slice-name1 slice-name2

Because pre-existing partitions are not usable for creating database replicas, it is frequently the case that we will steal space from swap to create a small partition for the replicas. To do so, we need to boot to single-user mode, use swap -d to unmount all swap, and format to re-partition the swap partition, freeing up space for a separate partition for the database replicas. Since the replicas are small, very few cylinders will be required.

Metadevice Management

The basic types of metadevices are:

Simple: Stripes or concatenations--consist only of physical slices.
Mirror: Multiple copies on simple metadevies (submirrors).
RAID5: Composed of multiple slices; includes distributed parity.
Trans: Master metadevice plus logging device.

Solaris Volume Manager can build metadevices either by using partitions as the basic building blocks, or by dividing a single large partition into soft partitions. Soft partitions are a way that SVM allows us to carve a single disk into more than 8 slices. We can either build soft partitions directly on a disk slice, or we can mirror (or RAID) slices, then carve up the resulting metadevice into soft partitions to build volumes.

Disksets are collections of disks that are managed together, in the same way that a Veritas Volume Manager (VxVM) disk group is managed together. Unlike in VxVM, SVM does not require us to explicitly specify a disk group. If Disksets are configured, we need to specify the set name for monitoring or management commands with a -s setname option. Disksets may be created as shared disksets, where multiple servers may be able to access them. (This is useful in an environment like Sun Cluster, for example.) In that case, we specify some hosts as mediators who determine who owns the diskset. (Note that disks added to shared disksets are re-partitioned in the expectation that we will use soft partitions.)

When metadevices need to be addressed by OS commands (like mkfs), we can reference them with device links of the form /dev/md/rdsk/d# or /dev/md/disksetname/rdsk/d

Here are the main command line commands within SVM:

Command	Description
`metaclear`	Deletes active metadevices and hot spare pools.
`metadb`	Manages state database replicas.
`metadetach`	Detaches a metadevice from a mirror or a logging device from a trans-metadevice.
`metahs`	Manages hot spares and hot spare pools.
`metainit`	Configures metadevices.
`metaoffline`	Takes submirrors offline.
`metaonline`	Places submirrors online.
`metaparam`	Modifies metadevice parameters.
`metarename`	Renames and switches metadevice names.
`metareplace`	Replaces slices of submirrors and RAID5 metadevices.
`metaroot`	Sets up system files for mirroring root.
`metaset`	Administers disksets.
`metastat`	Check metadevice health and state.
`metattach`	Attaches a metadevice to a mirror or a log to a trans-metadevice.

Here is how to perform several common types of operations in Solaris Volume Manager:

Operation	Procedure
Create state database replicas.	`metadb -a -f -c 2 c#t0d#s# c#t1d#s#`
Mirror the root partition.
Create a metadevice for the root partition:	`metainit -f d0 1 1 c#t0d#s#`
Create a metadevice for the root mirror partition.	`metainit d1 1 1 c#t1d#s#`
Set up a 1-sided mirror	`metainit d2 -m d0`
Edit the vfstab and system files.	`metaroot d2 lockfs -fa reboot`
Attach the root mirror.	`metattach d2 d1`
Mirror the swap partion.
Create metadevices for the swap partition and mirror.	`metainit -f d5 1 1 c#t0d#s# metainit -f d6 1 1 c#t1d#s#`
Attach submirror to mirror.	`metattach d7 d6`
Edit vfstab to mount swap mirror as a swap device.	Use root entry as a template.
Create a striped metadevice.	`metainit d# stripes slices c#t#d#s#...`
Create a striped metadevice with a non-default interlace size.	Add an `-i` interlace`k` option
Concatenate slices.	`metainit d# #slices 1 c#t#d#s# 1 c#t#d#s#...`
Create a soft partition metadevice.	`metainit dsource# -p dnew# size`
Create a RAID5 metadevice.	`metainit d# -r c#t#d#s# c#t#d#s# c#t#d#s#...`
Manage Hot Spares
Create a hot spare pool.	`metainit hsp001 c#t#d#s#...`
Add a slice to a pool.	`metahs -a hsp### /dev/dsk/c#t#d#s#`
Add a slice to all pools.	`metahs -a all /dev/dsk/c#t#d#s#`
Diskset Management
Deport a diskset.	`metaset -s setname -r`
Import a diskset.	`metaset -s setname -t -f`
Add hosts to a shared diskset.	`metaset -s setname -a -h` hostname1 hostname2
Add mediators to a shared diskset.	`metaset -s setname -a -m` hostname1 hostname2
Add devices to a shared diskset.	`metaset -s setname -a /dev/did/rdsk/d# /dev/did/rdsk/d#`
Check diskset status.	`metaset`

Solaris Volume Manager Monitoring

Solaris Volume Manager provides facilities for monitoring its metadevices. In particular, the metadb command monitors the database replicas, and the metastat command monitors the metadevices and hot spares.

Status messages that may be reported by metastat for a disk mirror include:

Okay: No errors, functioning correctly.
Resyncing: Actively being resynced following error detection or maintenance.
Maintenance: I/O or open error; all reads and writes have been discontinued.
Last Erred: I/O or open errors encountered, but no other copies available.

Hot spare status messages reported by metastat are:

Available: Ready to accept failover.
In-Use: Other slices have failed onto this device.
Attention: Problem with hot spare or pool.

Solaris Volume Manager Maintenance

Solaris Volume Manager is very reliable. As long as it is not misconfigured, there should be relatively little maintenance to be performed on Volume Manager itself. If the Volume Manager database is lost, however, it may need to be rebuilt in order to recover access to the data. To recover a system configuration:
Make a backup copy of /etc/opt/SUNWmd/md.cf.
Re-create the state databases:
metadb -a -f -c 2 c#t#d#s# c#t#d#s#
Copy md.cf to md.tab
Edit the md.tab so that all mirrors are one-way mirrors, RAID5 devices recreated with -k (to prevent re-initialization).
Verify the md.tab configuration validity:
metainit -n -a
Re-create the configuration:
metainit -a
Re-attach any mirrors:
metattach dmirror# dsubmirror#
Verify that things are okay:
metastat

More frequently, Solaris Volume Manager will be needed to deal with replacing a failed piece of hardware. To replace a disk which is spitting errors, but has not failed yet (as in Example 10-2): Add database replicas to unaffected disks until at least three exist outside of the failing disk. Remove any replicas from the failing disk: metadb metadb -d c#t#d#s# Detach and remove submirrors and hot spares on the failing disk from their mirrors and pools: metadetach dmirror# dsubmirror# metaclear -r dsubmirror# metahs -d hsp# c#t#d#s# [If the boot disk is being replaced, find the /devices name of the boot disk mirror]: ls -l /dev/rdsk/c#t#d#s0 [If the removed disk is a fibre channel disk, remove the /dev/dsk and /dev/rdsk links for the device.] Physically replace the disk. This may involve shutting down the system if the disk is not hot-swappable. Re-build any /dev and /devices links: drvconfig; disks or boot -r Format and re-partition the disk appropriately. Re-add any removed database replicas: metadb -a -c #databases c#t#d#s# Re-create and re-attach any removed submirrors: metainit dsubmirror# 1 1 c#t#d#s# metattach dmirror# dsubmirror# Re-create any removed hot spares. Replacing a failed disk (as in Example 10-3) is a similar procedure. The differences are: Remove database replicas and hot spares as above; submirrors will not be removable. After replacing the disk as above, replace the submirrors with metareplace: metareplace -e dmirror# c#t#d#s# Barring a misconfiguration, Solaris Volume Manager is a tremendous tool for increasing the reliability and redundancy of a server. More important, it allows us to postpone maintenance for a hard drive failure until the next maintenance window. The metastat tool is quite useful for identifying and diagnosing problems. Along with iostat -Ee, we can often catch problems before they reach a point where the disk has actually failed. Example 10-2 shows how to replace a failing (but not yet failed) mirrored disk. (In this case, we were able to hot-swap the disk, so no reboot was necessary. Since the disks were SCSI, we also did not need to remove or rebuild any /dev links.)

Replacing a Failing Disk with Solaris Volume Manager

# metastat

d0: Mirror

    Submirror 0: d1

      State: Okay

    Submirror 1: d2

      State: Okay

    Pass: 1

    Read option: roundrobin (default)

    Write option: parallel (default)

    Size: 20484288 blocks



d1: Submirror of d0

    State: Okay

    Size: 20484288 blocks

    Stripe 0:

        Device              Start Block  Dbase State        Hot Spare

        c0t0d0s0                   0     No    Okay





d2: Submirror of d0

    State: Okay

    Size: 20484288 blocks

    Stripe 0:

        Device              Start Block  Dbase State        Hot Spare

        c0t1d0s0                   0     No    Okay

...

# iostat -E

sd0      Soft Errors: 0 Hard Errors: 0 Transport Errors: 0

Vendor: SEAGATE  Product: ST373307LSUN72G  Revision: 0707 Serial No: 3HZ...

Size: 73.40GB <73400057856 bytes>

Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0

Illegal Request: 0 Predictive Failure Analysis: 0

sd1      Soft Errors: 593 Hard Errors: 28 Transport Errors: 1

Vendor: SEAGATE  Product: ST373307LSUN72G  Revision: 0707 Serial No: 3HZ...

Size: 73.40GB <73400057856 bytes>

Media Error: 24 Device Not Ready: 0 No Device: 1 Recoverable: 593

Illegal Request: 0 Predictive Failure Analysis: 1

# metadb

        flags           first blk       block count

     a m  p  luo        16              1034            /dev/dsk/c0t0d0s3

     a    p  luo        1050            1034            /dev/dsk/c0t0d0s3

     a    p  luo        2084            1034            /dev/dsk/c0t0d0s3

     a    p  luo        16              1034            /dev/dsk/c0t1d0s3

     a    p  luo        1050            1034            /dev/dsk/c0t1d0s3

     a    p  luo        2084            1034            /dev/dsk/c0t1d0s3



# metadb -d c0t1d0s3

# metadb

        flags           first blk       block count

     a m  p  luo        16              1034            /dev/dsk/c0t0d0s3

     a    p  luo        1050            1034            /dev/dsk/c0t0d0s3

     a    p  luo        2084            1034            /dev/dsk/c0t0d0s3

# metadetach d40 d42

d40: submirror d42 is detached

# metaclear -r d42

d42: Concat/Stripe is cleared

...

# metadetach d0 d2

d0: submirror d2 is detached

# metaclear -r d2

d2: Concat/Stripe is cleared

...

[Disk hot-swapped. No reboot or device reconfiguration necessary for this replacement]

...

# format

Searching for disks...done





AVAILABLE DISK SELECTIONS:

       0. c0t0d0 

          /pci@1c,600000/scsi@2/sd@0,0

       1. c0t1d0 

          /pci@1c,600000/scsi@2/sd@1,0

Specify disk (enter its number): 0

selecting c0t0d0

[disk formatted]





FORMAT MENU:

...

format> part





PARTITION MENU:

        0      - change `0' partition

...

        print  - display the current table

        label  - write partition map and label to the disk

        ! - execute , then return

        quit

partition> pr

Current partition table (original):

Total disk cylinders available: 14087 + 2 (reserved cylinders)



Part      Tag    Flag     Cylinders         Size            Blocks

  0       root    wm       0 -  2012        9.77GB    (2013/0/0)   20484288

...

partition> q





FORMAT MENU:

...

format> di





AVAILABLE DISK SELECTIONS:

       0. c0t0d0 

          /pci@1c,600000/scsi@2/sd@0,0

       1. c0t1d0 

          /pci@1c,600000/scsi@2/sd@1,0

Specify disk (enter its number)[0]: 1

selecting c0t1d0

[disk formatted]

format> part

...

[sd1 partitioned to match sd0's layout]

...

partition> 7

Part      Tag    Flag     Cylinders         Size            Blocks

  7 unassigned    wm       0                0         (0/0/0)             0



Enter partition id tag[unassigned]:

Enter partition permission flags[wm]:

Enter new starting cyl[0]: 4835

Enter partition size[0b, 0c, 0.00mb, 0.00gb]: 9252c

partition> la

Ready to label disk, continue? y



partition> pr

Current partition table (unnamed):

Total disk cylinders available: 14087 + 2 (reserved cylinders)



Part      Tag    Flag     Cylinders         Size            Blocks

  0       root    wm       0 -  2012        9.77GB    (2013/0/0)   20484288

...

partition> q

...

# metadb -a -c 3 c0t1d0s3

# metadb

        flags           first blk       block count

     a m  p  luo        16              1034            /dev/dsk/c0t0d0s3

     a    p  luo        1050            1034            /dev/dsk/c0t0d0s3

     a    p  luo        2084            1034            /dev/dsk/c0t0d0s3

     a        u         16              1034            /dev/dsk/c0t1d0s3

     a        u         1050            1034            /dev/dsk/c0t1d0s3

     a        u         2084            1034            /dev/dsk/c0t1d0s3

# metainit d2 1 1 c0t1d0s0

d2: Concat/Stripe is setup

cnjcascade1#metattach d0 d2

d0: submirror d2 is attached

[Re-create and attach the remainder of the submirrors.]

...

# metastat

d0: Mirror

    Submirror 0: d1

      State: Okay

    Submirror 1: d2

      State: Resyncing

    Resync in progress: 10 % done

    Pass: 1

    Read option: roundrobin (default)

    Write option: parallel (default)

    Size: 20484288 blocks



d1: Submirror of d0

    State: Okay

    Size: 20484288 blocks

    Stripe 0:

        Device              Start Block  Dbase State        Hot Spare

        c0t0d0s0                   0     No    Okay





d2: Submirror of d0

    State: Resyncing

    Size: 20484288 blocks

    Stripe 0:

        Device              Start Block  Dbase State        Hot Spare

        c0t1d0s0                   0     No    Okay

It is important to format the replacement disk to match the cylinder layout of the disk that is being replaced. If this is not done, mirrors and stripes will not rebuild properly.

When you replace a disk that has already failed, there is no ability to remove the submirrors. Instead, the metareplace -e command is used to re-sync the mirror onto the new disk.

Replacing a Failed Disk with Solaris Volume Manager


# iostat -E

...

sd1      Soft Errors: 0 Hard Errors: 0 Transport Errors: 5

Vendor: SEAGATE  Product: ST373307LSUN72G  Revision: 0507 Serial No: 3HZ7Z3CJ00007505

Size: 73.40GB <73400057856 bytes>

...

# metadb

        flags           first blk       block count

     a m  p  luo        16              1034            /dev/dsk/c1t0d0s3

     a    p  luo        1050            1034            /dev/dsk/c1t0d0s3

      W   p  l          16              1034            /dev/dsk/c1t1d0s3

      W   p  l          1050            1034            /dev/dsk/c1t1d0s3

     a    p  luo        16              1034            /dev/dsk/c1t2d0s3

     a    p  luo        1050            1034            /dev/dsk/c1t2d0s3

     a    p  luo        16              1034            /dev/dsk/c1t3d0s3

     a    p  luo        1050            1034            /dev/dsk/c1t3d0s3

# metadb -d /dev/dsk/c1t1d0s3

# format

Searching for disks...done





AVAILABLE DISK SELECTIONS:

       0. c1t0d0 

          /pci@1c,600000/scsi@2/sd@0,0

       1. c1t1d0 

          /pci@1c,600000/scsi@2/sd@1,0

       2. c1t2d0 

          /pci@1c,600000/scsi@2/sd@2,0

       3. c1t3d0 

          /pci@1c,600000/scsi@2/sd@3,0

Specify disk (enter its number): 0

selecting c1t0d0

[disk formatted]





FORMAT MENU:

...

        partition  - select (define) a partition table

...

format> part





PARTITION MENU:

...

        print  - display the current table

...

partition> pr

Current partition table (original):

Total disk cylinders available: 14087 + 2 (reserved cylinders)



Part      Tag    Flag     Cylinders         Size            Blocks

  0       root    wm       0 -  2012        9.77GB    (2013/0/0)   20484288

  1       swap    wu    2013 -  2214     1003.69MB    (202/0/0)     2055552

  2     backup    wm       0 - 14086       68.35GB    (14087/0/0) 143349312

  3 unassigned    wm    2215 -  2217       14.91MB    (3/0/0)         30528

  4 unassigned    wm    2218 -  5035       13.67GB    (2818/0/0)   28675968

  5 unassigned    wm    5036 - 12080       34.18GB    (7045/0/0)   71689920

  6        var    wm   12081 - 12684        2.93GB    (604/0/0)     6146304

  7       home    wm   12685 - 14086        6.80GB    (1402/0/0)   14266752



partition> q





FORMAT MENU:

        disk       - select a disk

...

format> di





AVAILABLE DISK SELECTIONS:

       0. c1t0d0 

          /pci@1c,600000/scsi@2/sd@0,0

       1. c1t1d0 

          /pci@1c,600000/scsi@2/sd@1,0

       2. c1t2d0 

          /pci@1c,600000/scsi@2/sd@2,0

       3. c1t3d0 

          /pci@1c,600000/scsi@2/sd@3,0

Specify disk (enter its number)[0]: 1

format> part





PARTITION MENU:

...

partition> pr

Current partition table (original):

Total disk cylinders available: 14087 + 2 (reserved cylinders)



Part      Tag    Flag     Cylinders         Size            Blocks

  0       root    wm       0 -    25      129.19MB    (26/0/0)       264576

  1       swap    wu      26 -    51      129.19MB    (26/0/0)       264576

  2     backup    wu       0 - 14086       68.35GB    (14087/0/0) 143349312

  3 unassigned    wm       0                0         (0/0/0)             0

  4 unassigned    wm       0                0         (0/0/0)             0

  5 unassigned    wm       0                0         (0/0/0)             0

  6        usr    wm      52 - 14086       68.10GB    (14035/0/0) 142820160

  7 unassigned    wm       0                0         (0/0/0)             0



...

partition> 7

Part      Tag    Flag     Cylinders         Size            Blocks

  7 unassigned    wm       0                0         (0/0/0)             0



Enter partition id tag[unassigned]: home

Enter partition permission flags[wm]:

Enter new starting cyl[0]: 12685

Enter partition size[0b, 0c, 0.00mb, 0.00gb]: 1402c

partition> pr

Current partition table (unnamed):

Total disk cylinders available: 14087 + 2 (reserved cylinders)



Part      Tag    Flag     Cylinders         Size            Blocks

  0       root    wm       0 -  2012        9.77GB    (2013/0/0)   20484288

  1       swap    wu    2013 -  2214     1003.69MB    (202/0/0)     2055552

  2     backup    wu       0 - 14086       68.35GB    (14087/0/0) 143349312

  3 unassigned    wm    2215 -  2217       14.91MB    (3/0/0)         30528

  4 unassigned    wm    2218 -  5035       13.67GB    (2818/0/0)   28675968

  5 unassigned    wm    5036 - 12080       34.18GB    (7045/0/0)   71689920

  6        var    wm   12081 - 12684        2.93GB    (604/0/0)     6146304

  7       home    wm   12685 - 14086        6.80GB    (1402/0/0)   14266752



partition> la

Ready to label disk, continue? y



partition> q

...

# metastat

...

d19: Mirror

    Submirror 0: d17

      State: Okay

    Submirror 1: d18

      State: Needs maintenance

    Pass: 1

    Read option: roundrobin (default)

    Write option: parallel (default)

    Size: 14266752 blocks



d17: Submirror of d19

    State: Okay

    Size: 14266752 blocks

    Stripe 0:

        Device              Start Block  Dbase State        Hot Spare

        c1t0d0s7                   0     No    Okay





d18: Submirror of d19

    State: Needs maintenance

    Invoke: metareplace d19 c1t1d0s7 

    Size: 14266752 blocks

    Stripe 0:

        Device              Start Block  Dbase State        Hot Spare

        c1t1d0s7                   0     No    Maintenance

...

# metareplace -e d19 c1t1d0s7

d19: device c1t1d0s7 is enabled

# metareplace -e d16 c1t1d0s6

d16: device c1t1d0s6 is enabled

# metareplace -e d13 c1t1d0s5

d13: device c1t1d0s5 is enabled

# metareplace -e d10 c1t1d0s4

d10: device c1t1d0s4 is enabled

# metareplace -e d2 c1t1d0s0

d2: device c1t1d0s0 is enabled

# metastat

...

d19: Mirror

    Submirror 0: d17

      State: Okay

    Submirror 1: d18

      State: Resyncing

    Resync in progress: 10 % done

    Pass: 1

    Read option: roundrobin (default)

    Write option: parallel (default)

    Size: 14266752 blocks



d17: Submirror of d19

    State: Okay

    Size: 14266752 blocks

    Stripe 0:

        Device              Start Block  Dbase State        Hot Spare

        c1t0d0s7                   0     No    Okay





d18: Submirror of d19

    State: Resyncing

    Size: 14266752 blocks

    Stripe 0:

        Device              Start Block  Dbase State        Hot Spare

        c1t1d0s7                   0     No    Resyncing

...

# metadb -a -c 2 c1t1d0s3

# metadb

        flags           first blk       block count

     a m  p  luo        16              1034            /dev/dsk/c1t0d0s3

     a    p  luo        1050            1034            /dev/dsk/c1t0d0s3

     a        u         16              1034            /dev/dsk/c1t1d0s3

     a        u         1050            1034            /dev/dsk/c1t1d0s3

     a    p  luo        16              1034            /dev/dsk/c1t2d0s3

     a    p  luo        1050            1034            /dev/dsk/c1t2d0s3

     a    p  luo        16              1034            /dev/dsk/c1t3d0s3

     a    p  luo        1050            1034            /dev/dsk/c1t3d0s3

Thursday, June 20, 2013

inittab

he /etc/inittab file plays a crucial role in the boot sequence.

For versions of Solaris prior to version 10, the /etc/inittab was edited manually. Solaris 10+ manages the /etc/inittab through SMF. The Solaris 10 inittab should not be edited directly

The default Solaris 10 inittab contains the following:


ap::sysinit:/sbin/autopush -f /etc/iu.ap
sp::sysinit:/sbin/soconfig -f /etc/sock2path
smf::sysinit:/lib/svc/bin/svc.startd >/dev/msglog 2<>/dev/msglog
p3:s1234:powerfail:/usr/sbin/shutdown -y -i5 -g0 >/dev/msglog 2<>/dev/...

The lines accomplish the following:

Initializes Streams
Configures socket transport providers
Initializes SMF master restarter
Describes a power fail shutdown

In particular, the default keyword is not used any more in Solaris 10. Instead, the default run level is determined within the SMF profile.

When the init process is started, it first sets environment variables set in the /etc/default/init file; by default, only TIMEZONE is set. Then init executes process entries from the inittab that have sysinit set, and transfers control of the startup process to svc.startd.

Solaris 8 and 9

The line entries in the inittab file have the following format:

id:runlevel:action:process

Here the id is a two-character unique identifier, runlevel indicates the run level involved, action indicates how the process is to be run, and process is the command to be executed.

At boot time, all entries with runlevel "sysinit" are run. Once these processes are run, the system moves towards the init level indicated by the "initdefault" line. For a default inittab, the line is:

is:3:initdefault:

(This indicates a default runlevel of 3.)

By default, the first script run from the inittab file is /etc/bcheckrc, which checks the state of the root and /usr filesystems. The line controlling this script has the following form:

fs::sysinit:/sbin/bcheckrc >/dev/console 2>&1 </dev/console

The inittab also controls what happens at each runlevel. For example, the default entry for runlevel 2 is:

s2:23:wait:/sbin/rc2 >/dev/console 2>&1 </dev/console

The action field of each entry will contain one of the following keywords:

powerfail: The system has received a "powerfail" signal.
wait: Wait for the command to be completed before proceeding.
respawn: Restart the command.

Wednesday, June 19, 2013

Veritas Volume Manager Notes

Veritas has long since been purchased by Symantec, but its products continue to be sold under the Veritas name. Over time, we can expect that some of the products will have name changes to reflect the new ownership.

Veritas produces volume and file system software that allows for extremely flexible and straightforward management of a system's disk storage resources. Now that ZFS is providing much of this same functionality from inside the OS, it will be interesting to see how well Veritas is able to hold on to its installed base.

In Veritas Volume Manager (VxVM) terminology, physical disks are assigned a diskname and imported into collections known as disk groups. Physical disks are divided into a potentially large number of arbitrarily sized, contiguous chunks of disk space known as subdisks. These subdisks are combined into volumes, which are presented to the operating system in the same way as a slice of a physical disk is.

Volumes can be striped, mirrored or RAID-5'ed. Mirrored volumes are made up of equally-sized collections of subdisks known as plexes. Each plex is a mirror copy of the data in the volume. The Veritas File System (VxFS) is an extent-based file system with advanced logging, snapshotting, and performance features.

VxVM provides dynamic multipathing (DMP) support, which means that it takes care of path redundancy where it is available. If new paths or disk devices are added, one of the steps to be taken is to run vxdctl enable to scan the devices, update the VxVM device list, and update the DMP database. In cases where we need to override DMP support (usually in favor of an alternate multipathing software like EMC Powerpath), we can run vxddladm addforeign.

Here are some procedures to carry out several common VxVM operations. VxVM has a Java-based GUI interface as well, but I always find it easiest to use the command line.

Standard VxVM Operations

Operation	Procedure
Create a volume: (length specified in sectors, KB, MB or GB)	`vxassist -g` dg-name `make` vol-name length(skmg)
Create a striped volume (add options for a stripe layout):	`layout=stripe` diskname1 diskname2 ...
Remove a volume (after unmounting and removing from vfstab):	`vxstop` vol-name then `vxassist -g` dg-name `remove volume` vol-name or `vxedit -rf rm` vol-name
Create a VxFS file system:	`mkfs -F vxfs -o largefiles /dev/vx/rdsk/`dg-name/vol-name
Snapshot a VxFS file system to an empty volume:	`mount -F vxfs -o snapof=orig-vol empty-vol` mount-point
Display disk group free space:	vxdg -g dg-name free
Display the maximum size volume that can be created:	`vxassist -g dg-name maxsize` [attributes]
List physical disks:	`vxdisk list`
Print VxVM configuration:	`vxprint -ht`
Add a disk to VxVM:	`vxdiskadm` (follow menu prompts) or `vxdiskadd` disk-name
Bring newly attached disks under VxVM control (it may be necessary to use `format` or `fmthard` to label the disk before the vxdiskconfig):	`drvconfig; disks vxdiskconfig vxdctl enable`
Scan devices, update VxVM device list, reconfigure DMP:	`vxdctl enable`
Scan devices on OS device tree, initiate dynamic reconfig of multipathed disks.	`vxdisk scandisks`
Reset a disabled vxconfigd daemon:	`vxconfigd -kr reset`
Manage hot spares:	`vxdiskadm` (follow menu options and prompts) `vxedit set spare=`[off\|on] vxvm-disk-name
Rename disks:	`vxedit rename` old-disk-name new-disk-name
Rename subdisks:	`vxsd mv` old-subdisk-name new-subdisk-name
Monitor volume performance:	`vxstat`
Re-size a volume (but not the file system):	`vxassist growto\|growby\|shrinkto\|shrinkby volume-name length[s\|m\|k\|g]`
Resize a volume, including the file system:	`vxresize -F vxfs` volume-name new-size[s\|m\|k\|g]
Change a volume's layout:	`vxassist relayout` volume-name `layout=`layout

The progress of many VxVM tasks can be tracked by setting the -t flag at the time the command is run: utility -t taskflag. If the task flag is set, we can use vxtask to list, monitor, pause, resume, abort or set the task labeled by the tasktag.

Physical disks which are added to VxVM control can either be initialized (made into a native VxVM disk) or encapsulated (disk slice/partition structure is preserved). In general, disks should only be encapsulated if there is data on the slices that needs to be preserved, or if it is the boot disk. (Boot disks must be encapsulated.) Even if there is data currently on a non-boot disk, it is best to back up the data, initialize the disk, create the file systems, and restore the data.

When a disk is initialized, the VxVM-specific information is placed in a reserved location on the disk known as a private region. The public region is the portion of the disk where the data will reside.

VxVM disks can be added as one of several different categories of disks:

sliced: Public and private regions are on separate physical partitions. (Usually s3 is the private region and s4 is the public region, but encapsulated boot disks are the reverse.)
simple: Public and private regions are on the same disk area.
cdsdisk: (Cross-Platform Data Sharing) This is the default, and allows disks to be shared across OS platforms. This type is not suitable for boot, swap or root disks.

If there is a VxFS license for the system, as many file systems as possible should be created as VxFS file systems to take advantage of VxFS's logging, performance and reliability features.

At the time of this writing, ZFS is not an appropriate file system for use on top of VxVM volumes. Sun warns that running ZFS on VxVM volumes can cause severe performance penalties, and that it is possible that ZFS mirrors and RAID sets would be laid out in a way that compromises reliability.

VxVM Maintenance

The first step in any VxVM maintenance session is to run vxprint -ht to check the state of the devices and configurations for all VxVM objects. (A specific volume can be specified with vxprint -ht volume-name.) This section includes a list of procedures for dealing with some of the most common problems. (Depending on the naming scheme of a VxVM installation, many of the below commands may require a -g dg-name option to specify the disk group.)

Volumes which are not starting up properly will be listed as DISABLED or DETACHED. A volume recovery can be attempted with the vxrecover -s volume-name command.
If all plexes of a mirror volume are listed as STALE, place the volume in maintenance mode, view the plexes and decide which plex to use for the recovery:
vxvol maint volume-name (The volume state will be DETACHED.)
vxprint -ht volume-name
vxinfo volume-name (Display additional information about unstartable plexes.)
vxmend off plex-name (Offline bad plexes.)
vxmend on plex-name (Online a plex as STALE rather than DISABLED.)
vxvol start volume-name (Revive stale plexes.)
vxplex att volume-name plex-name (Recover a stale plex.)
If, after the above procedure, the volume still is not started, we can force a plex to a “clean” state. If the plex is in a RECOVER state and the volume will not start, use a -f option on the vxvol command:
vxmend fix clean plex-name
vxvol start volume-name
vxplex att volume-name plex-name
If a subdisk status is listing as NDEV even when the disk is listed as available with vxdisk list the problem can sometimes be resolved by running
vxdg deport dgname; vxdg import dgname
to re-initialize the disk group.
To remove a disk:
Copy the data elsewhere if possible.
Unmount file systems from the disk or unmirror plexes that use the disk.
vxvol stop volume-name (Stop volumes on the disk.)
vxdg -g dg-name rmdisk disk-name (Remove disk from its disk group.)
vxdisk offline disk-name (Offline the disk.)
vxdiskunsetup c#t#d# (Remove the disk from VxVM control.)
To replace a failed disk other than the boot disk:
In vxdiskadm, choose option 4: Remove a disk for replacement. When prompted, chose “none” for the disk to replace it.
Physically remove and replace the disk. (A reboot may be necessary if the disk is not hot-swappable.) In the case of a fibre channel disk, it may be necessary to remove the /dev/dsk and /dev/rdsk links and rebuild them with
drvconfig; disks
or a reconfiguration reboot.
In vxdiskadm, choose option 5: Replace a failed or removed disk. Follow the prompts and replace the disk with the appropriate disk.
To replace a failed boot disk:
Use the eeprom command at the root prompt or the printenv command at the ok> prompt to make sure that the nvram=devalias and boot-device parameters are set to allow a boot from the mirror of the boot disk. If the boot paths are not set up properly for both mirrors of the boot disk, it may be necessary to move the mirror disk physically to the boot disk's location. Alternatively, the devalias command at the ok> prompt can set the mirror disk path correctly, then use nvstore to write the change to the nvram. (It is sometimes necessary to nvunalias aliasname to remove an alias from the nvramrc, then
nvalias aliasname devicepath
to set the new alias, then
nvstore
to write the changes to nvram.)
In short, set up the system so that it will boot from the boot disk's mirror.
Repeat the steps above to replace the failed disk.
Clearing a "Failing" Flag from a Disk:
First make sure that there really is not a hardware problem, or that the problem has been resolved. Then,
vxedit set failing=off disk-name
Clearing an IOFAIL state from a Plex:
First make sure that the hardware problem with the plex has been resolved. Then,
vxmend -g dgname -o force off plexname
vxmend -g dgname on plexname
vxmend -g dgname fix clean plexname
vxrecover -s volname

VxVM Resetting Plex State


soltest/etc/vx > vxprint -ht vol53

Disk group: testdg

V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE

PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE

SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE

SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE

SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE

DC NAME PARENTVOL LOGVOL

SP NAME SNAPVOL DCO

EX NAME ASSOC VC PERMS MODE STATE

SR NAME KSTATE

v vol53 - DISABLED ACTIVE 20971520 SELECT - fsgen

pl vol53-01 vol53 DISABLED IOFAIL 20971520 CONCAT - RW

sd disk141-21 vol53-01 disk141 423624704 20971520 0 EMC0_2 ENA

soltest/etc/vx > vxmend -g testdg -o force off vol53-01

soltest/etc/vx > vxprint -ht vol53

Disk group: testdg

V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE

PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE

SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE

v vol53 - DISABLED ACTIVE 20971520 SELECT - fsgen

pl vol53-01 vol53 DISABLED OFFLINE 20971520 CONCAT - RW

sd disk141-21 vol53-01 disk141 423624704 20971520 0 EMC0_2 ENA

soltest/etc/vx > vxmend -g testdg on vol53-01

soltest/etc/vx > vxprint -ht vol53

Disk group: testdg

V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE

PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE

SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE

v vol53 - DISABLED ACTIVE 20971520 SELECT - fsgen

pl vol53-01 vol53 DISABLED STALE 20971520 CONCAT - RW

sd disk141-21 vol53-01 disk141 423624704 20971520 0 EMC0_2 ENA

soltest/etc/vx > vxmend -g testdg fix clean vol53-01

soltest/etc/vx > !vxprint

vxprint -ht vol53

Disk group: testdg

V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE

PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE

SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE

v vol53 - DISABLED ACTIVE 20971520 SELECT - fsgen

pl vol53-01 vol53 DISABLED CLEAN 20971520 CONCAT - RW

sd disk141-21 vol53-01 disk141 423624704 20971520 0 EMC0_2 ENA

soltest/etc/vx > vxrecover -s vol53

soltest/etc/vx > !vxprint

vxprint -ht vol53

Disk group: testdg

V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE

PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE

SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE

v vol53 - ENABLED ACTIVE 20971520 SELECT - fsgen

pl vol53-01 vol53 ENABLED ACTIVE 20971520 CONCAT - RW

sd disk141-21 vol53-01 disk141 423624704 20971520 0 EMC0_2 ENA

VxVM Mirroring

Most volume manager availability configuration is centered around mirroring. While RAID-5 is a possible option, it is infrequently used due to the parity calculation overhead and the relatively low cost of hardware-based RAID-5 devices.

In particular, the boot device must be mirrored; it cannot be part of a RAID-5 configuration. To mirror the boot disk:

eeprom use-nvramrc?=true
Before mirroring the boot disk, set use-nvramrc? to true in the EEPROM settings. If you forget, you will have to go in and manually set up the boot path for your boot mirror disk. (See “To replace a failed boot disk” in the “VxVM Maintenance” section for the procedure.) It is much easier if you set the parameter properly before mirroring the disk!
The boot disk must be encapsulated, preferably in the bootdg disk group. (The bootdg disk group membership used to be required for the boot disk. It is still a standard, and there is no real reason to violate it.)
If possible, the boot mirror should be cylinder-aligned with the boot disk. (This means that the partition layout should be the same as that for the boot disk.) It is preferred that 1-2MB of unpartitioned space be left at either the very beginning or the very end of the cylinder list for the VxVM private region. Ideally, slices 3 and 4 should be left unconfigured for VxVM's use as its public and private region. (If the cylinders are aligned, it will make OS and VxVM upgrades easier in the future.)
(Before bringing the boot mirror into the bootdg disk group, I usually run an installboot command on that disk to install the boot block in slice 0. This should no longer be necessary; vxrootmir should take care of this for us. I have run into circumstances in the past where vxrootmir has not set up the boot block properly; Veritas reports that those bugs have long since been fixed.)
Mirrors of the root disk must be configured with "sliced" format and should live in the bootdg disk group. They cannot be configured with cdsdisk format. If necessary, remove the disk and re-add it in vxdiskadm.
In vxdiskadm, choose option 6: Mirror Volumes on a Disk. Follow the prompts from the utility. It will call vxrootmir under the covers to take care of the boot disk setup portion of the operation.
When the process is done, attempt to boot from the boot mirror. (Check the EEPROM devalias settings to see which device alias has been assigned to the boot mirror, and run boot device-alias from the ok> prompt.

Procedure to create a Mirrored-Stripe Volume: (A mirrored-stripe volume mirrors several striped plexes—it is better to set up a Striped-Mirror Volume.)

vxassist -g dg-name make volume length layout=mirror-stripe Creating a Striped-Mirror Volume: (Striped-mirror volumes are layered volumes which stripes across underlaying mirror volumes.)
vxassist -g dg-name make volume length layout=stripe-mirror

Removing a plex from a mirror:

vxplex -g dg-name -o rm dis plex-name Removing a mirror from a volume:
vxassist -g dg-name remove mirror volume-name

Removing a mirror and all associated subdisks:

vxplex -o rm dis volume-name

Dissociating a plex from a mirror (to provide a snapshot):

vxplex dis volume-name
vxmake -U gen vol new-volume-name plex=plex-name (Creating a new volume with a dissociated plex.)
vxvol start new-volume-name
vxvol stop new-volume-name (To re-associate this plex with the old volume.)
vxplex dis plex-name
vxplex att old-volume-name plex-name
vxedit rm new-volume-name

Removing a Root Disk Mirror:

vxplex -o rm dis rootvol-02 swapvol-02 [other root disk volumes]
/etc/vx/bin/vxunroot

Posted by ScottCromar at 3:30 AM No comments:
Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Tuesday, June 18, 2013

Recovery Strategies

Besides cost, the key business continuity drivers for a recovery solution are the Recovery Point Objective and the Recovery Time Objective.
Recovery Point Objective
The Recovery Point Objective (RPO) refers to the recovery point in time. Another way to think of this is that the RPO specifies the maximum allowable time delay between a data commit on the production side and the replication of this data to the recovery site.
It is probably easiest to think of RPO in terms of the amount of allowable data loss. The RPO is frequently expressed in terms of its relation to the time at which replication stops, as in “less than 5 minutes of data loss.”
Recovery Time Objective
The second major business driver is the Recovery Time Objective (RTO). This is the amount of time it will take us to recover from a disaster. Depending on the context, this may refer only to the technical steps required to bring up services on the recovery system. Usually, however, it refers to the amount of time that the service will be unavailable, including time to discover that an outage has occurred, the time required to decide to fail over, the time to get staff in place to perform the recovery, and then the amount of time to bring up services at the recovery site.
The costs associated with different RPO and RTO values will be determined by the type of application and its business purpose. Some applications may be able to tolerate unplanned outages of up to days without incurring substantial costs. Other applications may cause significant business-side problems with even minor amounts of unscheduled downtime.
Different applications and environments have different tolerances for RPO and RTO. Some applications might be able to tolerate a potential data loss of days or even weeks; some may not be able to tolerate any data loss at all. Some applications can remain unavailable long enough for us to purchase a new system and restore from tape; some cannot.
Recovery Strategies
There are several different strategies for recovering an application. Choosing a strategy will almost always involve an investment in hardware, software, and implementation time. If a strategy is chosen that does not support the business RPO and RTO requirements, an expensive re-tooling may be necessary.
Many types of replication solutions can be implemented at a server, disk storage, or storage network level. Each has unique advantages and disadvantages. Server replication tends to be cheapest, but also involves using server cycles to manage the replication. Storage network replication is extremely flexible, but can be more difficult to configure. Disk storage replication tends to be rock solid, but is usually limited in terms of supported hardware for the replication target.
Regardless where we choose to implement our data replication solution, we will still face a lot of the same issues. One issue that needs to be addressed is re-silvering of a replication solution that has been partitioned for some amount of time. Ideally, only the changed sections of the disks will need to be re-replicated. Some less sophisticated solutions require a re-silvering of the entire storage area, which can take a long time and soak up a lot of bandwidth. Re-silvering is an issue that needs to be investigaged during the product evaluation.
Continuity Planning
Continuity planning should be done during the initial architecture and design phases for each service. If the service is not designed to accommodate a natural recovery, it will be expensive and difficult to retrofit a recovery mechanism.
The type of recovery that is appropriate for each service will depend on the importance of the service and what the tolerance for downtime is for that service.
There are five generally-recognized approaches to recovery architecture:

Server Replacement: Some services are run on standard server images with very little local customization. Such servers may most easily be recovered by replacing them with standard hardware and standard server images.

Backup and Restore: Where there is a fair amount of tolerance for downtime on a service, it may be acceptable to rely on hardware replacement combined with restores from backups.

Shared Nothing Failover: Some services are largely data-independent and do not require frequent data replication. In such cases, it might make sense to have an appropriately configured replacement at a recovery site. (One example may be an application server that pulls its data from a database. Aside from copying configuration changes, replication of the main server may not be necessary.)

Replication and Failover: Several different replication technologies exist, each with different strengths and weaknesses. Array-based, SAN-based, file system-based or file-based technologies allow replication of data on a targeted basis. Synchronous replication techniques prevent data loss at the cost of performance and geographic dispersion. Asynchronous replication techniques permit relatively small amounts of data loss in order to preserve performance or allow replication across large distances. Failover techniques range from nearly instantaneous automated solutions to administrator-invoked scripts to involved manual checklists.

Live Active-Active Stretch Clusters: Some services can be provided by active servers in multiple locations, where failover happens by client configurations. Some examples include DNS services (failover by resolv.conf lists), SMTP gateway servers (failover by MX record), web servers (failover by DNS load balancing), and some market data services (failover by client configuration). Such services should almost never be down. (Stretch clusters are clusters where the members are located at geographically dispersed locations.)

Which of these recovery approaches is appropriate to a given situation will depend on the cost of downtime on the service, as well as the particular characteristics of the service's architecture.
Causes of Recovery Failure
Janco released a study outlining the most frequent causes of a recovery failure:

Failure of the backup or replication solution. If the a copy of the data is not available, we will not be able to recover.

Unidentified failure modes. The recovery plan does not cover a type of failure.

Failure to train staff in recovery procedure. If people don't know how to carry out the plan, the work is wasted.

Lack of a communication plan. How do you communicate when your usual infrastructure is not available?

Insufficient backup power. Do you have enough capacity? How long will it run?

Failure to prioritize. What needs to be restored first? If you don't lay that out in advance, you will waste valuable time on recovering less critical services.

Unavailable disaster documentation. If your documentation is only available on the systems that have failed, you are stuck. Keep physical copies available in recovery locations.

Inadequate testing. Tests reveal weaknesses in the plan and also train staff to deal with a recovery situation in a timely way.

Unavailable passwords or access. If the recovery team does not have the permissions necessary to carry out the recovery, it will fail.

Plan is out of date. If the plan is not updated to reflect changes in the environment, the recovery will not succeed.

Recovery Business Practices
Janco also suggested several key business practices to improve the likelihood that you will survive a recovery:

Eliminate single points of failure.

Regularly update staff contact information, including assigned responsibilities.

Stay abreast of current events, such as weather and other emergency situations.

Plan for the worst case.

Document your plans and keep updated copies available in well-known, available locations.

Script what you can, and test your scripts.

Define priorities and thresholds.

Perform regular tests and make sure you can meet your RTO and RPO requirements.

Posted by ScottCromar at 4:00 AM No comments:
Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Monday, June 17, 2013

PROM Environment Variables

PROM Environment variables can be set at either the root user prompt or the ok> prompt.
Most (but not all) PROM environment variables can be set with the /usr/sbin/eeprom command. When invoked by itself, it prints out the current environment variables. To use eeprom to set a variable, use the syntax:
/usr/sbin/eeprom variable_name=value
All PROM environment variables can be set at the ok> prompt. the printenv command prints out the current settings. The syntax for setting a variable is:
setenv variable_name value

Posted by ScottCromar at 4:00 AM 1 comment:
Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Solaris Troubleshooting Handbook

Solaris Troubleshooting Handbook

From Techie to Boss

From Techie to Boss: Transitioning to Leadership

Blogs of Friends and Family

Last Refuge of Common Sense

Ingrates - Every facet of the current administration’s response to Hurricane Maria has been late, half-hearted, and grudging. Even common sense measures such as a waiv...
7 years ago

Serenity Knits

Mr. Rubio, please do not kill my kids.... - Dear Mr. Rubio, et al: Please do not kill my kids. Last weekend, Nick, our 24yo son, experienced what he describes as “the worst asthma attack of my life...
8 years ago

From Techie to Boss: Transition to Leadership

LISA 2016 Presentation Slides - Here's a link to the slides for my presentation on Managing Dispersed Teams for the LISA 2016 conference. The training at this conference is top-notch, w...
8 years ago

Thoughts on Security

Pre-Installed Phone Malware - Some new Samsung, Motorola, Asus, and LG phones are reported to have come with malware installed. Samsung reports that the malware, which appears to be an ...
11 years ago

About Me

ScottCromar

St Augustine, FL, United States

Experienced Information Technology leader, author, system administrator, and systems architect.

View my complete profile

Blog Archive

▼ 2014 (2)

▼ February (1)

Upcoming Classes

► January (1)

► 2013 (70)

► June (7)

► May (19)

► April (36)

► March (8)

► 2011 (2)

► June (2)

► 2009 (1)

► September (1)

► 2008 (7)

► August (2)

► July (1)

► April (4)

► 2007 (16)

► August (2)

► July (4)

► May (1)

► April (1)

► March (5)

► February (1)

► January (2)

► 2006 (11)

► December (9)

► November (2)

From Techie to Boss: Transitioning to Leadership

Links

Solaris 10 Documentation

Solaris Troubleshooting Handbook

Solaris Troubleshooting

Solaris 2 FAQ

Oracle/Sun Documentation Web Site

Sun BigAdmin

Church Youth Group Trip to UN

Help fund a church youth group trip to the UN.

Sponsor Links

Picture Window theme. Theme images by konradlew. Powered by Blogger.