Tenshi log monitor now supports Redis inputs

Sig-I/O has been using Tenshi for quite a while, as it’s one of the easier and more flexible log monitoring tools available. It’s also quite light-weight and has only a few perl modules as requirements.

However, tenshi has been showing it’s age, as it only supported syslog, flat files or fifo’s as inputs. These days json based logging with graylog2, logstash or other tools seems to be all the rage.

Since we needed to setup a new log monitoring solution for a customer and they didn’t have a central syslog server, but were using logstash and redis, it was a perfect time to add Redis support to Tenshi.

Patching a Redis input to Tenshi turned out to be quite easy using the perl Redis module. The patch has been sent to the upstream developer and will most likely be included into a next release.

For those who can’t wait, and want to try out the Redis support, the code can be found at my github repository in the ‘redis’ branch of tenshi

Installing php 5.4 or 5.5 on CentOS 6.x / RHEL 6.x / SL 6.x

There are many posts on the internet about people wanting to install a newer PHP release on their EL6 boxes. Most of these posts will tell you to either install the ‘remi’ repository, or packages from ‘webtatic’. However, there is a newer, and in my opinion better, method now. Software Collections

Redhat has created the concept of software collections, in which they can provide newer or additional packages to the base OS. These packages come with a more limited support package, but they are at least a somewhat standardised way of installing additional functionality without impacting the base OS. Red Hat Enterprise Linux 6

In RHEL systems, collections can be enabled with:

  • Enable the redhat collections channel

    • rhn-channel –add –channel=rhel-x86_64-server-6-rhscl-1

  • Then install software from it:

    • yum install php54-php

More info can be found on http://developerblog.redhat.com/2013/08/01/php-5-4-on-rhel-6-using-rhscl/ CentOS 6 / SL 6 / OEL 6

For the community EL6 systems, the following procedure can be used:

The list of available collections and their package-url’s can be found on https://www.softwarecollections.org/en/scls/

Testing Red Hat Enterprise Linux 7 and CentOS 7 (preview)

Last week, 'Red Hat'_ released the final version of Red Hat Enterprise Linux version 7. A few days later the CentOS project made a first preview version of CentOS 7 available. Since many of our customers are running on RHEL 6 and/or CentOS 6, now was a good time to look into the newly release 7.0 version.

Both the CentOS-7 and RHEL-7 installations completed without any problems, something that was still giving more then enough issues during the beta and release-candidate stages of RHEL-7. We tested the ‘default’ graphical install, the text-based install and kickstart installs in both graphical and text-modes. Currently we’re fine-tuning our kickstart configuration for the 7 releases, so installs can be fully automated and fast. Kickstart Configuration

At this time, our kickstart looks somewhat like this (censored to protect sensitive data):

#version=RHEL7
# System authorization information
auth --enableshadow --passalgo=sha512

# Use network installation
url --url="http://buildlogs.centos.org/centos/7/os/x86_64-20140614/"
# Use text mode install
text
# Keyboard layouts
keyboard --vckeymap=us --xlayouts='us'
# System language
lang en_US.UTF-8

# Network information
network  --bootproto=dhcp --device=eth0 --ipv6=auto --activate
network  --hostname=centos7previewkickstarttest
# Root password
rootpw some-password
# Do not configure the X Window System
skipx
# System timezone
timezone Europe/Amsterdam --isUtc
#user --groups=wheel --name=useraccount --password=some-password --gecos="User"
# Skip EULA
eula --agreed
# Disable firewall
firewall --disabled
# Don't run the Setup Agent on first boot
firstboot --disabled
# Selinux (ENFORCING|permissive|disabled)
selinux --enforcing
# Reboot the machine when the installation is finished, eject CD
reboot --eject
# Enable SSH services
services --enabled sshd
# Include auto-generated disk-config
%include /tmp/include.me

%packages
# Core and base are default, just specify them anyway to make this clear
# Then unselect the 'default' marked packages from core and base, which we don't need
%packages
@core
@base
# default from core
-aic94xx-firmware
-alsa-firmware
-bfa-firmware
-dracut-config-rescue
-ivtv-firmware
-iwl1000-firmware
-iwl100-firmware
-iwl105-firmware
-iwl135-firmware
-iwl2000-firmware
-iwl2030-firmware
-iwl3160-firmware
-iwl3945-firmware
-iwl4965-firmware
-iwl5000-firmware
-iwl5150-firmware
-iwl6000-firmware
-iwl6000g2a-firmware
-iwl6000g2b-firmware
-iwl6050-firmware
-iwl7260-firmware
-kernel-tools
-libertas-sd8686-firmware
-libertas-sd8787-firmware
-libertas-usb8388-firmware
-microcode_ctl
-NetworkManager
-NetworkManager-tui
-ql2100-firmware
-ql2200-firmware
-ql23xx-firmware
postfix
linux-firmware
# default from base
-abrt-addon-ccpp
-abrt-addon-python
-abrt-cli
-abrt-console-notification
bash-completion
-blktrace
bridge-utils
bzip2
chrony
-cryptsetup
-dmraid
-dosfstools
ethtool
-fprintd-pam
-gnupg2
-hunspell-en
-hunspell
-kpatch
-ledmon
-libaio
-libreport-plugin-mailx
-libstoragemgmt
lvm2
man-pages-overrides
man-pages
mdadm
mlocate
mtr
nano
ntpdate
-pinfo
-plymouth
pm-utils
-rdate
-rfkill
rng-tools
rsync
-scl-utils
-setuptool
smartmontools
-sos
-sssd-client
strace
sysstat
-systemtap-runtime
tcpdump
-tcsh
-teamd
time
unzip
usbutils
vim-enhanced
virt-what
wget
which
-words
xfsdump
xz
-yum-langpacks
-yum-plugin-security
yum-utils
zip
acpid
redhat-lsb-core
%end

%pre
#!/bin/bash

# Check physical and virtio disks
for disk in /sys/block/sd* /sys/block/vd*
do
        dsk=$(basename $disk)

        if [[ `cat $disk/ro` -eq 1 ]];
        then
                echo "Skipping disk $dsk: READONLY"
                continue;
        fi

        if [[ `cat $disk/removable` -eq 1 ]];
        then
                echo "Skipping disk $dsk: REMOVABLE"
                continue;
        fi

        if [[ `cat $disk/size` -lt 20971520 ]];
        then
                echo "Skipping disk $dsk: Smaller then 10G"
                continue;
        else
                echo "Using disk $dsk"
                chosen=$dsk;
                break;
        fi
done

incfile=/tmp/include.me
> $incfile

if [[ -n $chosen ]];
then
        echo "zerombr" >> $incfile
        echo "bootloader --location=mbr --driveorder=$chosen --append=\"nomodeset console=tty0 console=ttyS0,115200n8\"" >> $incfile
        echo "ignoredisk --only-use=$chosen" >> $incfile
        echo "clearpart --all --initlabel --drives=$chosen" >> $incfile
        echo "part /boot --fstype=ext3 --asprimary --size=256" >> $incfile
        echo "part pv.$chosen --grow --size=15000" >> $incfile
        echo "volgroup vg00 --pesize=32768 pv.$chosen" >> $incfile
        echo "logvol / --fstype=xfs --name=root --vgname=vg00 --size=4096" >> $incfile
        echo "logvol swap --name=swap --vgname=vg00 --size=256" >> $incfile
else
        echo "" > $incfile
fi

%end

# PostInstall stuff
%post --log=/root/anaconda-postinstall.log
#!/bin/sh
cd /
echo "GRUB_TERMINAL=\"serial console\"" >> /etc/default/grub
echo "GRUB_TERMINAL_OUTPUT=\"serial console\"" >> /etc/default/grub
echo "GRUB_SERIAL_COMMAND=\"serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1\"" >> /etc/default/grub

echo "#!/bin/sh" > /usr/local/sbin/update-grub
echo "grub2-mkconfig -o /boot/grub2/grub.cfg" >> /usr/local/sbin/update-grub
chmod +x /usr/local/sbin/update-grub

/usr/local/sbin/update-grub

echo "[base-c7-preview]" > /etc/yum.repos.d/c7-preview.repo
echo "name=CentOS-7-Preview" >> /etc/yum.repos.d/c7-preview.repo
echo "baseurl=http://buildlogs.centos.org/centos/7/os/x86_64-20140614/" >> /etc/yum.repos.d/c7-preview.repo
echo "enabled=1" >> /etc/yum.repos.d/c7-preview.repo
echo "gpgcheck=0" >> /etc/yum.repos.d/c7-preview.repo
%end

This kickstart configuration will try to install a minimal CentOS 7 (preview) system, on the first available disk that is non-removable and bigger then 10GB. It will use LVM and create a 4GB root filesystem, leaving the remaining diskspace free for later use.

It will also configure grub to react on serial input and the vga console, and also configure the kernel and getty’s to work on both serial and vga consoles. It will configure the centos7-preview repository, so installing extra software should be easy. The current install is optimized for virtual machines, and doesn’t install anything related to sound, wifi, network-manager etc.

The entire install will end up being about 840MB, and includes all requirements for running ansible playbooks, so ansible can be used to further configure the system after initial installation. Most notable changes

The major changes that you will run into, if you are used to CentOS/RHEL 5 and 6 are:

  • Systemd is now used as init system, with all required changes that come with it

  • BTRFS and XFS are supported, and the system can use either as a root filesystem (xfs was available in centos/redhat 6, but not usable as root or boot filesystem)

  • On installs with X and a desktop, you will now get a Gnome 3 Classic based desktop. Luckily a tweak-tool is available and installed by default, which will allow you to tune the desktop somewhat.

Post heartbleed, looking at alternatives to OpenSSL

With the bug of the year (Heartbleed) patched on all my systems, I decided to look into alternative SSL implementations. I mostly use Lighttpd and Nginx, and try to stay away from Apache, unless I really need something that will only work with it.

/images/heartbleed.png

Apache seems to have 2 possible SSL implementations available. The first one, mod_ssl, which is based on OpenSSL, is used by ~99% of the users. mod_nss is the second SSL implementation for Apache, based on the netscape/mozilla NSS library.

A quick google-search for alternative SSL implementations or modules for Nginx and Lighttpd returned no actual working code, only some requests for functionality, so it seems that this is (at this time) a dead end. Looking further on google and wikipedia I looked at the alternative SSL implementations, specifically axTLS, PolarSSL and MatrixSSL.

axTLS

AxTLS was the first library I looked at, within the source distribution is a minimalistic webserver with SSL/TLS support and a tool called axtlswrap, a simple stunnel like wrapper for the AxTLS library. I quickly configured the minimalistic webserver (axhttpd) and gave it a temporary StartSSL certificate. The configuration was quite simple after configuring the library to actually use my provided certificate and key instead of the built-in certificate. I then decided to run the SSLLabs tests against this webserver, to see how it would compare against my regular Lighttpd/OpenSSL secured websites.

The library provided the minimal required features, but at this time doesn’t support TLS1.2, and had only limited support for the various popular ciphers. This would result in a usable webserver, but with limited options in ciphers and limited support for strong encryption. Since I didn’t feel like dropping back to a ‘B’ score in SSLLabs tests, AxTLS was not a sufficient solution.

PolarSSL

PolarSSL is another open-source SSL library, which is available under the Gnu GPLv2 or a commercial license. It’s currently being used in various well-known products (PowerDNS and OpenVPN-NL, a dutch-government approved version of OpenVPN). PolarSSL seemed to support all the latest ciphers and TLS standards and there were at least 2 httpd servers that supported using PolarSSL.

Hiawatha

Hiawatha was the first webserver software I tried with PolarSSL. The code for PolarSSL is actually included in the downloads from Hiawatha, so you don’t need to download this seperately. The hiawatha source-tree includes a script to download a newer version of PolarSSL when needed.

Hiawatha was also quite easy to configure, and I had it running with my certificate in a matter of minutes. Another test-run of the Qualys SSLLabs tests gave a very positive result. The only downside was a choice for a 1024 bit Diffie–Hellman key exchange. I struggled a bit to get this to 2048 or 4096 bits, but after a few failed attempts found a Hiawatha configuration option to set this (DHsize = 4096).

With this setting I got my beloved A score in Qualys SSLLabs. Support for TLS versions 1.0, 1.1 and 1.2, with support for SSLv2 and v3 completely disabled.

I didn’t do much further testing with Hiawatha, as I only configured it to do SSL/TLS and then reverse-proxy everything to a lighttpd server running plain http.

Monkey

The monkey webserver is another http daemon with support for PolarSSL. It’s focussed and optimized for running only on Linux systems, and aims to have a good performance while also having all the standard and required features (ipv6, virtual-hosting, fastcgi)

To enable the SSL module with Monkey, it’s required to run ‘configure’ with the ‘–enable-plugins=polarssl’ option. This doesn’t seem to be documented in either configure or the INSTALL or README files. It’s also funny to see that compiling jemalloc.c takes about as much time as all the other files combined.

My first build of Monkey was against the systems PolarSSL, which resulted in an A- score in the SSLLabs tests. I spotted that the Debian polarssl package was still stuck at version 1.2.9, so I downloaded the latest PolarSSL and built that version (1.3.6) (make lib SHARED=1; make install)

When running with PolarSSL 1.3.6, the Monkey webserver also got an ‘A’ score in SSLLabs. The only ‘red’ is the session resumption, which for some reason isn’t working. This is still something I need to look into. Also, it appears that monkey can only run over a single transport, so either HTTP or HTTPS. This would mean running two seperate monkey-instances for http+https support. This might be something that will be fixed in a future monkey version.

For as long as my testing system is still running, you can check the live SSLLabs score on it. Please ignore the ‘axtls’ in the hostname, it’s currently running Monkey, but the certificate I created for testing was first used for testing axtls.

Feature-wise monkey seems like a nice alternative to Apache, Nginx and Lighttpd, and looking at the http performance, it also gives some very nice results. My testing with HTTPS however quickly triggered various bugs and crashes, so it seems that the PolarSSL support for Monkey is still somewhat buggy. I could succesfully run my benchmarks against the server in HTTP mode, at up to 18000 requests per second (on a dual core VM), but running with SSL enabled would crash the monkey process when running with more then a hand full of threads. The system would be stable for hours with only 2-6 threads, but when running with 10+ threads it would crash within seconds. I hope these bugs will be fixed soon, so Monkey with PolarSSL will prove itself to be a worthy competitor in the SSL serverspace.

Update 2014/04/22:

It seems my libpolarssl was compiled with some incorrect settings. I’ve reconfigured/recompiled polarssl and both monkey and hiawatha. The servers are now stable as expected, performance-data will be updated as well, as soon as my new tests have completed.

Basically: make sure you enable POLARSSL_THREADING_PTHREAD and disable POLARSSL_DEBUG_C

Configuring bridging, bonding, vlans on CentOS/RedHat 6.x

/etc/sysconfig/network-scripts/ifcfg-em1 and em2

DEVICE=<device-name>
ONBOOT=yes
HWADDR=XX:XX:XX:XX:XX:XX

/etc/sysconfig/network-scripts/ifcfg-em1.vlanid (example ifcfg-em1.20)

DEVICE=<device>.<vlanid>
ONBOOT=yes
VLAN=yes
BOOTPROTO=none
SLAVE=yes
MASTER=bond

/etc/sysconfig/network-scripts/ifcfg-bond (example ifcfg-bond20)

DEVICE=bond<vlanid>
ONBOOT=yes
BONDING_OPTS="mode=active-backup primary=em1 miimon=100"
BOOTPROTO=none
BRIDGE=br<vlanid>
MACADDR=<RANDOM_MAC>

/etc/sysconfig/network-scripts/ifcfg-br (example ifcfg.br20)

DEVICE=br<vlanid>
ONBOOT=yes
TYPE=Bridge
DELAY=0
BOOTPROTO=static
IPADDR=X.X.X.X
NETMASK=X.X.X.X

So, creating 3 vlans (20, 200, 250) on a 2 interface bond (em1, em2) creates the following set of configfiles:

ifcfg-em1:

Base configuration for em1

ifcfg-em2:

Base configuration for em2

ifcfg-em1.20:

Vlan 20 interface on em1 interface

ifcfg-em1.200:

Vlan 200 interface on em1 interface

ifcfg-em1.250:

Vlan 250 interface on em1 interface

ifcfg-em2.20:

Vlan 20 interface on em2 interface

ifcfg-em2.200:

Vlan 200 interface on em2 interface

ifcfg-em2.250:

Vlan 250 interface on em2 interface

ifcfg-bond20:

Bonding em1.20 with em2.20

ifcfg-bond200:

Bonding em1.200 with em2.200

ifcfg-bond250:

Bonding em1.250 with em2.250

ifcfg-br20:

Create a bridge on bond20

ifcfg-br200:

Create a bridge on bond200

ifcfg-br250:

Create a bridge on bond250

A simple NRPE alternative, based on bash, cron and NSCA

Monitoring remote hosts with Nagios can be done with various methods, ranging from snmp, ssh, nrpe, of a custom solution. To monitor some ‘black-box’ appliances with a very minimal OS-environment it wasn’t possible to install/run the NRPE agent. Since I seem to be using more and more passive nagios checks with the nagios service check acceptor (NSCA), it seemed like a good idea to try and use that.

I copied most of the checks to the system and setup the NSCA configuration (/etc/send_nsca.cfg), then I created a simple bash script which is scheduled to run from cron and loops through a list of service-checks to execute.

The check-results are then fed into send_nsca to finally arrive at the monitoring system. This way you only need to allow incoming traffic on 1 port to the nagios monitoring host and have no connections going into the device being monitored.

Update: The code has been updated and moved to my github account. You can find it at: https://github.com/sigio/sysadmin in the files monitor.sh and monitor.rc

Linux Leap-Second problem

So, this weekend was quite an interesting one, as on July 1st 02:00 local time (00:00 UTC) a leap-second was added via NTP. This caused serious problems for all my Java Virtual Machines and mysql databases.

If your system has printed the following line (in dmesg), a leap-second has been added recently:

Clock: inserting leap second 23:59:60 UTC

On most of my systems, the JVM’s would spike to 100% cpu load over all cores, mysql seems to also do this.

The work-around/fix at this time is to run:

date date +"%m%d%H%M%C%y.%S"

Hopefully this will be fixed in the kernel before the next leap-second is added. Which could be as soon as 2013/01/01, though probably later.

Update 2018:

It was indeed fixed before the next leap-second, which occurred somewhere not too long after. I haven't encountered this problem since.

Resetting failed drive in linux mdadm raid array

Today I was greeted with a failed drive in a mdadm raid array. The drive had some transient errors and was kicked out of the array, but testing showed that the drive still seemed to work just fine.

Harddisks

Image by Martin Abegglen (https://www.flickr.com/photos/twicepix/3333710952)

The following procedure will remove the drive from the array, remove it from the system, re-probe for disks, and then re-add the drive back into the array(s).

  • Remove the failed drive from the array, in this case, it was /dev/sdb:

    • mdadm --manage --set-faulty /dev/md0 /dev/sdb1

  • Make sure nothing on this disk is being used (mounts, other arrays, etc)

  • Reseat the drive from the system, either physically, or using the following commands:

    • echo 1 > /sys/block/sdb/device/delete

    • echo "- - -" > /sys/class/scsi_host/host1/scan

  • Check if the drive is found again, and check if it works correctly

    • check dmesg output, or look at /proc/partitions

    • try running: ‘pv < /dev/sdb of=/dev/zero‘

  • Re-add the drive to the array(s)

    • mdadm /dev/md0 -a /dev/sdb1

    • cat /proc/mdstat

That should do the trick…

Run-time editing of limits in Linux

On CentOS and RHEL Linux (with kernels >= 2.6.32) you can modify resource-limits (ulimit) run-time. This can be done using the /proc/<pid>/limits functionality. On older kernels this file is read-only and can be used to inspect the limits that are in effect on the process. On newer systems you can modify the limits with echo:

cat /proc/pid/limits echo -n "Max open files=soft:hard" > /proc/pid/limits cat /proc/pid/limits

On older systems you will have to modify limits before starting a process.

(See also the post on serverfault)

If you are not running CentOS/RHEL, you can use the ‘prlimit’ command, which does the same thing, but doesn’t rely on a patch that’s no longer present in current kernels.

Online resizing of multipath devices in Linux dm-multipath

Linux doesn’t automatically re-size multipath devices, so this procedure must be used to have online re-sizing of multipath. (Offline re-size is automatic, just remove the mapping and reload)

  • this example will use multipath device /dev/mpath/testdisk, with scsi disks /dev/sdx and /dev/sdy

  • Resize the lun on the underlying storage layer (iscsi / san)

  • Check which sd? devices are relevant, and re-scan these:

    • multipath -ll testdisk

    • blockdev –rereadpt /dev/sdx

    • blockdev –rereadpt /dev/sdy

    • blockdev –getsz /dev/sdx

    • blockdev –getsz /dev/sdy

  • Take note of the new size returned by getsz.

  • Dump the dmsetup table to a file (and a backup)

    • dmsetup table testdisk | tee mapping.bak mapping.cur

  • Edit the table stored in ‘mapping.cur’

    • vi mapping.cur, replace field 2 (size) with the new size from getsz

  • Suspend I/O, reread the table, and resume I/O

    • dmsetup suspend testdisk; dmsetup reload testdisk mapping.cur; dmsetup resume testdisk

  • The multipath device should now be resized:

  • multipath -ll

You can now resize the filesystem on the multipath device, or the LVM-PV if you use LVM on the LUN.