Sig-I/O

bonding bridging network redhat vlan

Configuring bridging, bonding, vlans on CentOS/RedHat 6.x

/etc/sysconfig/network-scripts/ifcfg-em1 and em2

DEVICE=<device-name>
ONBOOT=yes
HWADDR=XX:XX:XX:XX:XX:XX

/etc/sysconfig/network-scripts/ifcfg-em1.vlanid (example ifcfg-em1.20)

DEVICE=<device>.<vlanid>
ONBOOT=yes
VLAN=yes
BOOTPROTO=none
SLAVE=yes
MASTER=bond

/etc/sysconfig/network-scripts/ifcfg-bond (example ifcfg-bond20)

DEVICE=bond<vlanid>
ONBOOT=yes
BONDING_OPTS="mode=active-backup primary=em1 miimon=100"
BOOTPROTO=none
BRIDGE=br<vlanid>
MACADDR=<RANDOM_MAC>

/etc/sysconfig/network-scripts/ifcfg-br (example ifcfg.br20)

DEVICE=br<vlanid>
ONBOOT=yes
TYPE=Bridge
DELAY=0
BOOTPROTO=static
IPADDR=X.X.X.X
NETMASK=X.X.X.X

So, creating 3 vlans (20, 200, 250) on a 2 interface bond (em1, em2) creates the following set of configfiles:

ifcfg-em1: Base configuration for em1
ifcfg-em2: Base configuration for em2
ifcfg-em1.20: Vlan 20 interface on em1 interface
ifcfg-em1.200: Vlan 200 interface on em1 interface
ifcfg-em1.250: Vlan 250 interface on em1 interface
ifcfg-em2.20: Vlan 20 interface on em2 interface
ifcfg-em2.200: Vlan 200 interface on em2 interface
ifcfg-em2.250: Vlan 250 interface on em2 interface
ifcfg-bond20: Bonding em1.20 with em2.20
ifcfg-bond200: Bonding em1.200 with em2.200
ifcfg-bond250: Bonding em1.250 with em2.250
ifcfg-br20: Create a bridge on bond20
ifcfg-br200: Create a bridge on bond200
ifcfg-br250: Create a bridge on bond250
icinga monitor.sh monitoring nagios nrpe

A simple NRPE alternative, based on bash, cron and NSCA

Monitoring remote hosts with Nagios can be done with various methods, ranging from snmp, ssh, nrpe, of a custom solution. To monitor some ‘black-box’ appliances with a very minimal OS-environment it wasn’t possible to install/run the NRPE agent. Since I seem to be using more and more passive nagios checks with the nagios service check acceptor (NSCA), it seemed like a good idea to try and use that.

I copied most of the checks to the system and setup the NSCA configuration (/etc/send_nsca.cfg), then I created a simple bash script which is scheduled to run from cron and loops through a list of service-checks to execute.

The check-results are then fed into send_nsca to finally arrive at the monitoring system. This way you only need to allow incoming traffic on 1 port to the nagios monitoring host and have no connections going into the device being monitored.

Update: The code has been updated and moved to my github account. You can find it at: https://github.com/sigio/sysadmin in the files monitor.sh and monitor.rc

bug leap-second linux

Linux Leap-Second problem

So, this weekend was quite an interesting one, as on July 1st 02:00 local time (00:00 UTC) a leap-second was added via NTP. This caused serious problems for all my Java Virtual Machines and mysql databases.

If your system has printed the following line (in dmesg), a leap-second has been added recently:

Clock: inserting leap second 23:59:60 UTC

On most of my systems, the JVM’s would spike to 100% cpu load over all cores, mysql seems to also do this.

The work-around/fix at this time is to run:

date date +"%m%d%H%M%C%y.%S"

Hopefully this will be fixed in the kernel before the next leap-second is added. Which could be as soon as 2013/01/01, though probably later.

Update 2018:

It was indeed fixed before the next leap-second, which occurred somewhere not too long after. I haven't encountered this problem since.

drive linux mdadm raid reset sata

Resetting failed drive in linux mdadm raid array

Today I was greeted with a failed drive in a mdadm raid array. The drive had some transient errors and was kicked out of the array, but testing showed that the drive still seemed to work just fine.

The following procedure will remove the drive from the array, remove it from the system, re-probe for disks, and then re-add the drive back into the array(s).

  • Remove the failed drive from the array, in this case, it was /dev/sdb:
    • mdadm --manage --set-faulty /dev/md0 /dev/sdb1
  • Make sure nothing on this disk is being used (mounts, other arrays, etc)
  • Reseat the drive from the system, either physically, or using the following commands:
    • echo 1 > /sys/block/sdb/device/delete
    • echo "- - -" > /sys/class/scsi_host/host1/scan
  • Check if the drive is found again, and check if it works correctly
    • check dmesg output, or look at /proc/partitions
    • try running: ‘pv < /dev/sdb of=/dev/zero‘
  • Re-add the drive to the array(s)
    • mdadm /dev/md0 -a /dev/sdb1
    • cat /proc/mdstat

That should do the trick…

kernel limit linux proc resource ulimit

Run-time editing of limits in Linux

On CentOS and RHEL Linux (with kernels >= 2.6.32) you can modify resource-limits (ulimit) run-time. This can be done using the /proc/<pid>/limits functionality. On older kernels this file is read-only and can be used to inspect the limits that are in effect on the process. On newer systems you can modify the limits with echo:

cat /proc/pid/limits echo -n "Max open files=soft:hard" > /proc/pid/limits cat /proc/pid/limits

On older systems you will have to modify limits before starting a process.

(See also the post on serverfault)

If you are not running CentOS/RHEL, you can use the ‘prlimit’ command, which does the same thing, but doesn’t rely on a patch that’s no longer present in current kernels.

dmsetup iscsi linux lvm multipath resize

Online resizing of multipath devices in Linux dm-multipath

Linux doesn’t automatically re-size multipath devices, so this procedure must be used to have online re-sizing of multipath. (Offline re-size is automatic, just remove the mapping and reload)

  • this example will use multipath device /dev/mpath/testdisk, with scsi disks /dev/sdx and /dev/sdy
  • Resize the lun on the underlying storage layer (iscsi / san)
  • Check which sd? devices are relevant, and re-scan these:
    • multipath -ll testdisk
    • blockdev –rereadpt /dev/sdx
    • blockdev –rereadpt /dev/sdy
    • blockdev –getsz /dev/sdx
    • blockdev –getsz /dev/sdy
  • Take note of the new size returned by getsz.
  • Dump the dmsetup table to a file (and a backup)
    • dmsetup table testdisk | tee mapping.bak mapping.cur
  • Edit the table stored in ‘mapping.cur’
    • vi mapping.cur, replace field 2 (size) with the new size from getsz
  • Suspend I/O, reread the table, and resume I/O
    • dmsetup suspend testdisk; dmsetup reload testdisk mapping.cur; dmsetup resume testdisk
  • The multipath device should now be resized:
  • multipath -ll

You can now resize the filesystem on the multipath device, or the LVM-PV if you use LVM on the LUN.