Sig-I/O has been using Tenshi for quite a while, as it’s one of the easier and more flexible log monitoring tools available. It’s also quite light-weight and has only a few perl modules as requirements.
However, tenshi has been showing it’s age, as it only supported syslog, flat files or fifo’s as inputs. These days json based logging with graylog2, logstash or other tools seems to be all the rage.
Since we needed to setup a new log monitoring solution for a customer and they didn’t have a central syslog server, but were using logstash and redis, it was a perfect time to add Redis support to Tenshi.
Patching a Redis input to Tenshi turned out to be quite easy using the perl Redis module. The patch has been sent to the upstream developer and will most likely be included into a next release.
For those who can’t wait, and want to try out the Redis support, the code can be found at my github repository in the ‘redis’ branch of tenshi
There are many posts on the internet about people wanting to install a newer PHP release on their EL6 boxes. Most of these posts will tell you to either install the ‘remi’ repository, or packages from ‘webtatic’. However, there is a newer, and in my opinion better, method now. Software Collections
Redhat has created the concept of software collections, in which they can provide newer or additional packages to the base OS. These packages come with a more limited support package, but they are at least a somewhat standardised way of installing additional functionality without impacting the base OS. Red Hat Enterprise Linux 6
In RHEL systems, collections can be enabled with:
- Enable the redhat collections channel
- rhn-channel –add –channel=rhel-x86_64-server-6-rhscl-1
- Then install software from it:
- yum install php54-php
More info can be found on http://developerblog.redhat.com/2013/08/01/php-5-4-on-rhel-6-using-rhscl/ CentOS 6 / SL 6 / OEL 6
For the community EL6 systems, the following procedure can be used:
- Install the collection rpm for the collection you wish to use
- Install the packages from the collection
- yum install php54
The list of available collections and their package-url’s can be found on https://www.softwarecollections.org/en/scls/
Last week, 'Red Hat'_ released the final version of Red Hat Enterprise Linux version 7. A few days later the CentOS project made a first preview version of CentOS 7 available. Since many of our customers are running on RHEL 6 and/or CentOS 6, now was a good time to look into the newly release 7.0 version.
Both the CentOS-7 and RHEL-7 installations completed without any problems, something that was still giving more then enough issues during the beta and release-candidate stages of RHEL-7. We tested the ‘default’ graphical install, the text-based install and kickstart installs in both graphical and text-modes. Currently we’re fine-tuning our kickstart configuration for the 7 releases, so installs can be fully automated and fast. Kickstart Configuration
At this time, our kickstart looks somewhat like this (censored to protect sensitive data):
#version=RHEL7 # System authorization information auth --enableshadow --passalgo=sha512 # Use network installation url --url="http://buildlogs.centos.org/centos/7/os/x86_64-20140614/" # Use text mode install text # Keyboard layouts keyboard --vckeymap=us --xlayouts='us' # System language lang en_US.UTF-8 # Network information network --bootproto=dhcp --device=eth0 --ipv6=auto --activate network --hostname=centos7previewkickstarttest # Root password rootpw some-password # Do not configure the X Window System skipx # System timezone timezone Europe/Amsterdam --isUtc #user --groups=wheel --name=useraccount --password=some-password --gecos="User" # Skip EULA eula --agreed # Disable firewall firewall --disabled # Don't run the Setup Agent on first boot firstboot --disabled # Selinux (ENFORCING|permissive|disabled) selinux --enforcing # Reboot the machine when the installation is finished, eject CD reboot --eject # Enable SSH services services --enabled sshd # Include auto-generated disk-config %include /tmp/include.me %packages # Core and base are default, just specify them anyway to make this clear # Then unselect the 'default' marked packages from core and base, which we don't need %packages @core @base # default from core -aic94xx-firmware -alsa-firmware -bfa-firmware -dracut-config-rescue -ivtv-firmware -iwl1000-firmware -iwl100-firmware -iwl105-firmware -iwl135-firmware -iwl2000-firmware -iwl2030-firmware -iwl3160-firmware -iwl3945-firmware -iwl4965-firmware -iwl5000-firmware -iwl5150-firmware -iwl6000-firmware -iwl6000g2a-firmware -iwl6000g2b-firmware -iwl6050-firmware -iwl7260-firmware -kernel-tools -libertas-sd8686-firmware -libertas-sd8787-firmware -libertas-usb8388-firmware -microcode_ctl -NetworkManager -NetworkManager-tui -ql2100-firmware -ql2200-firmware -ql23xx-firmware postfix linux-firmware # default from base -abrt-addon-ccpp -abrt-addon-python -abrt-cli -abrt-console-notification bash-completion -blktrace bridge-utils bzip2 chrony -cryptsetup -dmraid -dosfstools ethtool -fprintd-pam -gnupg2 -hunspell-en -hunspell -kpatch -ledmon -libaio -libreport-plugin-mailx -libstoragemgmt lvm2 man-pages-overrides man-pages mdadm mlocate mtr nano ntpdate -pinfo -plymouth pm-utils -rdate -rfkill rng-tools rsync -scl-utils -setuptool smartmontools -sos -sssd-client strace sysstat -systemtap-runtime tcpdump -tcsh -teamd time unzip usbutils vim-enhanced virt-what wget which -words xfsdump xz -yum-langpacks -yum-plugin-security yum-utils zip acpid redhat-lsb-core %end %pre #!/bin/bash # Check physical and virtio disks for disk in /sys/block/sd* /sys/block/vd* do dsk=$(basename $disk) if [[ `cat $disk/ro` -eq 1 ]]; then echo "Skipping disk $dsk: READONLY" continue; fi if [[ `cat $disk/removable` -eq 1 ]]; then echo "Skipping disk $dsk: REMOVABLE" continue; fi if [[ `cat $disk/size` -lt 20971520 ]]; then echo "Skipping disk $dsk: Smaller then 10G" continue; else echo "Using disk $dsk" chosen=$dsk; break; fi done incfile=/tmp/include.me > $incfile if [[ -n $chosen ]]; then echo "zerombr" >> $incfile echo "bootloader --location=mbr --driveorder=$chosen --append=\"nomodeset console=tty0 console=ttyS0,115200n8\"" >> $incfile echo "ignoredisk --only-use=$chosen" >> $incfile echo "clearpart --all --initlabel --drives=$chosen" >> $incfile echo "part /boot --fstype=ext3 --asprimary --size=256" >> $incfile echo "part pv.$chosen --grow --size=15000" >> $incfile echo "volgroup vg00 --pesize=32768 pv.$chosen" >> $incfile echo "logvol / --fstype=xfs --name=root --vgname=vg00 --size=4096" >> $incfile echo "logvol swap --name=swap --vgname=vg00 --size=256" >> $incfile else echo "" > $incfile fi %end # PostInstall stuff %post --log=/root/anaconda-postinstall.log #!/bin/sh cd / echo "GRUB_TERMINAL=\"serial console\"" >> /etc/default/grub echo "GRUB_TERMINAL_OUTPUT=\"serial console\"" >> /etc/default/grub echo "GRUB_SERIAL_COMMAND=\"serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1\"" >> /etc/default/grub echo "#!/bin/sh" > /usr/local/sbin/update-grub echo "grub2-mkconfig -o /boot/grub2/grub.cfg" >> /usr/local/sbin/update-grub chmod +x /usr/local/sbin/update-grub /usr/local/sbin/update-grub echo "[base-c7-preview]" > /etc/yum.repos.d/c7-preview.repo echo "name=CentOS-7-Preview" >> /etc/yum.repos.d/c7-preview.repo echo "baseurl=http://buildlogs.centos.org/centos/7/os/x86_64-20140614/" >> /etc/yum.repos.d/c7-preview.repo echo "enabled=1" >> /etc/yum.repos.d/c7-preview.repo echo "gpgcheck=0" >> /etc/yum.repos.d/c7-preview.repo %end
This kickstart configuration will try to install a minimal CentOS 7 (preview) system, on the first available disk that is non-removable and bigger then 10GB. It will use LVM and create a 4GB root filesystem, leaving the remaining diskspace free for later use.
It will also configure grub to react on serial input and the vga console, and also configure the kernel and getty’s to work on both serial and vga consoles. It will configure the centos7-preview repository, so installing extra software should be easy. The current install is optimized for virtual machines, and doesn’t install anything related to sound, wifi, network-manager etc.
The entire install will end up being about 840MB, and includes all requirements for running ansible playbooks, so ansible can be used to further configure the system after initial installation. Most notable changes
The major changes that you will run into, if you are used to CentOS/RHEL 5 and 6 are:
- Systemd is now used as init system, with all required changes that come with it
- BTRFS and XFS are supported, and the system can use either as a root filesystem (xfs was available in centos/redhat 6, but not usable as root or boot filesystem)
- On installs with X and a desktop, you will now get a Gnome 3 Classic based desktop. Luckily a tweak-tool is available and installed by default, which will allow you to tune the desktop somewhat.
With the bug of the year (Heartbleed) patched on all my systems, I decided to look into alternative SSL implementations. I mostly use Lighttpd and Nginx, and try to stay away from Apache, unless I really need something that will only work with it.
Apache seems to have 2 possible SSL implementations available. The first one, mod_ssl, which is based on OpenSSL, is used by ~99% of the users. mod_nss is the second SSL implementation for Apache, based on the netscape/mozilla NSS library.
A quick google-search for alternative SSL implementations or modules for Nginx and Lighttpd returned no actual working code, only some requests for functionality, so it seems that this is (at this time) a dead end. Looking further on google and wikipedia I looked at the alternative SSL implementations, specifically axTLS, PolarSSL and MatrixSSL.
AxTLS was the first library I looked at, within the source distribution is a minimalistic webserver with SSL/TLS support and a tool called axtlswrap, a simple stunnel like wrapper for the AxTLS library. I quickly configured the minimalistic webserver (axhttpd) and gave it a temporary StartSSL certificate. The configuration was quite simple after configuring the library to actually use my provided certificate and key instead of the built-in certificate. I then decided to run the SSLLabs tests against this webserver, to see how it would compare against my regular Lighttpd/OpenSSL secured websites.
The library provided the minimal required features, but at this time doesn’t support TLS1.2, and had only limited support for the various popular ciphers. This would result in a usable webserver, but with limited options in ciphers and limited support for strong encryption. Since I didn’t feel like dropping back to a ‘B’ score in SSLLabs tests, AxTLS was not a sufficient solution.
PolarSSL is another open-source SSL library, which is available under the Gnu GPLv2 or a commercial license. It’s currently being used in various well-known products (PowerDNS and OpenVPN-NL, a dutch-government approved version of OpenVPN). PolarSSL seemed to support all the latest ciphers and TLS standards and there were at least 2 httpd servers that supported using PolarSSL.
Hiawatha was the first webserver software I tried with PolarSSL. The code for PolarSSL is actually included in the downloads from Hiawatha, so you don’t need to download this seperately. The hiawatha source-tree includes a script to download a newer version of PolarSSL when needed.
Hiawatha was also quite easy to configure, and I had it running with my certificate in a matter of minutes. Another test-run of the Qualys SSLLabs tests gave a very positive result. The only downside was a choice for a 1024 bit Diffie–Hellman key exchange. I struggled a bit to get this to 2048 or 4096 bits, but after a few failed attempts found a Hiawatha configuration option to set this (DHsize = 4096).
With this setting I got my beloved A score in Qualys SSLLabs. Support for TLS versions 1.0, 1.1 and 1.2, with support for SSLv2 and v3 completely disabled.
I didn’t do much further testing with Hiawatha, as I only configured it to do SSL/TLS and then reverse-proxy everything to a lighttpd server running plain http.
The monkey webserver is another http daemon with support for PolarSSL. It’s focussed and optimized for running only on Linux systems, and aims to have a good performance while also having all the standard and required features (ipv6, virtual-hosting, fastcgi)
To enable the SSL module with Monkey, it’s required to run ‘configure’ with the ‘–enable-plugins=polarssl’ option. This doesn’t seem to be documented in either configure or the INSTALL or README files. It’s also funny to see that compiling jemalloc.c takes about as much time as all the other files combined.
My first build of Monkey was against the systems PolarSSL, which resulted in an A- score in the SSLLabs tests. I spotted that the Debian polarssl package was still stuck at version 1.2.9, so I downloaded the latest PolarSSL and built that version (1.3.6) (make lib SHARED=1; make install)
When running with PolarSSL 1.3.6, the Monkey webserver also got an ‘A’ score in SSLLabs. The only ‘red’ is the session resumption, which for some reason isn’t working. This is still something I need to look into. Also, it appears that monkey can only run over a single transport, so either HTTP or HTTPS. This would mean running two seperate monkey-instances for http+https support. This might be something that will be fixed in a future monkey version.
For as long as my testing system is still running, you can check the live SSLLabs score on it. Please ignore the ‘axtls’ in the hostname, it’s currently running Monkey, but the certificate I created for testing was first used for testing axtls.
Feature-wise monkey seems like a nice alternative to Apache, Nginx and Lighttpd, and looking at the http performance, it also gives some very nice results. My testing with HTTPS however quickly triggered various bugs and crashes, so it seems that the PolarSSL support for Monkey is still somewhat buggy. I could succesfully run my benchmarks against the server in HTTP mode, at up to 18000 requests per second (on a dual core VM), but running with SSL enabled would crash the monkey process when running with more then a hand full of threads. The system would be stable for hours with only 2-6 threads, but when running with 10+ threads it would crash within seconds. I hope these bugs will be fixed soon, so Monkey with PolarSSL will prove itself to be a worthy competitor in the SSL serverspace.
It seems my libpolarssl was compiled with some incorrect settings. I’ve reconfigured/recompiled polarssl and both monkey and hiawatha. The servers are now stable as expected, performance-data will be updated as well, as soon as my new tests have completed.
Basically: make sure you enable POLARSSL_THREADING_PTHREAD and disable POLARSSL_DEBUG_C
/etc/sysconfig/network-scripts/ifcfg-em1 and em2
/etc/sysconfig/network-scripts/ifcfg-em1.vlanid (example ifcfg-em1.20)
/etc/sysconfig/network-scripts/ifcfg-bond (example ifcfg-bond20)
DEVICE=bond<vlanid> ONBOOT=yes BONDING_OPTS="mode=active-backup primary=em1 miimon=100" BOOTPROTO=none BRIDGE=br<vlanid> MACADDR=<RANDOM_MAC>
/etc/sysconfig/network-scripts/ifcfg-br (example ifcfg.br20)
So, creating 3 vlans (20, 200, 250) on a 2 interface bond (em1, em2) creates the following set of configfiles:
|ifcfg-em1:||Base configuration for em1|
|ifcfg-em2:||Base configuration for em2|
|ifcfg-em1.20:||Vlan 20 interface on em1 interface|
|ifcfg-em1.200:||Vlan 200 interface on em1 interface|
|ifcfg-em1.250:||Vlan 250 interface on em1 interface|
|ifcfg-em2.20:||Vlan 20 interface on em2 interface|
|ifcfg-em2.200:||Vlan 200 interface on em2 interface|
|ifcfg-em2.250:||Vlan 250 interface on em2 interface|
|ifcfg-bond20:||Bonding em1.20 with em2.20|
|ifcfg-bond200:||Bonding em1.200 with em2.200|
|ifcfg-bond250:||Bonding em1.250 with em2.250|
|ifcfg-br20:||Create a bridge on bond20|
|ifcfg-br200:||Create a bridge on bond200|
|ifcfg-br250:||Create a bridge on bond250|
Monitoring remote hosts with Nagios can be done with various methods, ranging from snmp, ssh, nrpe, of a custom solution. To monitor some ‘black-box’ appliances with a very minimal OS-environment it wasn’t possible to install/run the NRPE agent. Since I seem to be using more and more passive nagios checks with the nagios service check acceptor (NSCA), it seemed like a good idea to try and use that.
I copied most of the checks to the system and setup the NSCA configuration (/etc/send_nsca.cfg), then I created a simple bash script which is scheduled to run from cron and loops through a list of service-checks to execute.
The check-results are then fed into send_nsca to finally arrive at the monitoring system. This way you only need to allow incoming traffic on 1 port to the nagios monitoring host and have no connections going into the device being monitored.
Update: The code has been updated and moved to my github account. You can find it at: https://github.com/sigio/sysadmin in the files monitor.sh and monitor.rc
So, this weekend was quite an interesting one, as on July 1st 02:00 local time (00:00 UTC) a leap-second was added via NTP. This caused serious problems for all my Java Virtual Machines and mysql databases.
If your system has printed the following line (in dmesg), a leap-second has been added recently:
Clock: inserting leap second 23:59:60 UTC
On most of my systems, the JVM’s would spike to 100% cpu load over all cores, mysql seems to also do this.
The work-around/fix at this time is to run:
date date +"%m%d%H%M%C%y.%S"
Hopefully this will be fixed in the kernel before the next leap-second is added. Which could be as soon as 2013/01/01, though probably later.
It was indeed fixed before the next leap-second, which occurred somewhere not too long after. I haven't encountered this problem since.
Today I was greeted with a failed drive in a mdadm raid array. The drive had some transient errors and was kicked out of the array, but testing showed that the drive still seemed to work just fine.
The following procedure will remove the drive from the array, remove it from the system, re-probe for disks, and then re-add the drive back into the array(s).
- Remove the failed drive from the array, in this case, it was /dev/sdb:
- mdadm --manage --set-faulty /dev/md0 /dev/sdb1
- Make sure nothing on this disk is being used (mounts, other arrays, etc)
- Reseat the drive from the system, either physically, or using the following commands:
- echo 1 > /sys/block/sdb/device/delete
- echo "- - -" > /sys/class/scsi_host/host1/scan
- Check if the drive is found again, and check if it works correctly
- check dmesg output, or look at /proc/partitions
- try running: ‘pv < /dev/sdb of=/dev/zero‘
- Re-add the drive to the array(s)
- mdadm /dev/md0 -a /dev/sdb1
- cat /proc/mdstat
That should do the trick…
On CentOS and RHEL Linux (with kernels >= 2.6.32) you can modify resource-limits (ulimit) run-time. This can be done using the /proc/<pid>/limits functionality. On older kernels this file is read-only and can be used to inspect the limits that are in effect on the process. On newer systems you can modify the limits with echo:
cat /proc/pid/limits echo -n "Max open files=soft:hard" > /proc/pid/limits cat /proc/pid/limits
On older systems you will have to modify limits before starting a process.
(See also the post on serverfault)
If you are not running CentOS/RHEL, you can use the ‘prlimit’ command, which does the same thing, but doesn’t rely on a patch that’s no longer present in current kernels.
Linux doesn’t automatically re-size multipath devices, so this procedure must be used to have online re-sizing of multipath. (Offline re-size is automatic, just remove the mapping and reload)
- this example will use multipath device /dev/mpath/testdisk, with scsi disks /dev/sdx and /dev/sdy
- Resize the lun on the underlying storage layer (iscsi / san)
- Check which sd? devices are relevant, and re-scan these:
- multipath -ll testdisk
- blockdev –rereadpt /dev/sdx
- blockdev –rereadpt /dev/sdy
- blockdev –getsz /dev/sdx
- blockdev –getsz /dev/sdy
- Take note of the new size returned by getsz.
- Dump the dmsetup table to a file (and a backup)
- dmsetup table testdisk | tee mapping.bak mapping.cur
- Edit the table stored in ‘mapping.cur’
- vi mapping.cur, replace field 2 (size) with the new size from getsz
- Suspend I/O, reread the table, and resume I/O
- dmsetup suspend testdisk; dmsetup reload testdisk mapping.cur; dmsetup resume testdisk
- The multipath device should now be resized:
- multipath -ll
You can now resize the filesystem on the multipath device, or the LVM-PV if you use LVM on the LUN.