Monitoring – it’s all about integration and automation – OSMC 2017 Hackathon

OSMC 2017

Also this year we organized a hackathon as follow up and managed to get about 50 people to work on actual coding. We started again with a small round of introduction so everyone had the chance to find people with same interests or knowledge needed. Afterwards people started to hack on Icinga 2, Icinga Web 2, different Modules, OpenNMS, Zabbix, Mgmt, NSClient++, Docker containers, Ansible and Puppet code or simply help others with configuration and other tasks to solve in their environment.

Here is a list of some things developed or at least designed today:
* Tom accepted and improved some of my pull requests, so the director got more property modifiers
* He also was working on improving notifications to allow managing them via a custom attribute of hosts and services
* Markus was improving Icinga packaging resulting in new package releases for SLES and support for Fedora 27
* Bodo was trying to move the ruby library for Icinga 2 to 1.0.0 release and got valuable input by Gunnar for displaying API coverage
* Thomas improved his diagnostics script for Icinga 2 to help with troubleshooting
* Nicola was working on a graphical picker for the geolocation in the Director for his awesome map module while getting several other ideas and requests
* David started a Single Sign On module for Icinga Web 2
* Mgmt got some improvements by Julien, Toshaan und James
* Michael was working on Elastic integration and web based installer for NSClient++
* Gunnar and Michael discussed so many features they actual did not find time for hacking, but keep our eyes open for Elastic 6 support and datatypes for arguments
* Steffen, Blerim and Michael discussed how to fix a problem with running two Icingabeat instances which now could probably be solved
* Stephan finally solved the management issue of red alerts in Icinga Web 2 😉

This slideshow requires JavaScript.

Furthermore an impressive amount of knowledge was transferred, user questions got answered and problems got solved. One thing I am really happy about seeing one user to use the URL encode property modifier only minutes after being accept by Tom to create Hostgroups including membership assignment from PuppetDB. But I want to end this blogpost with one really cool thing Dave from the Australian Icinga Partner Sol1 showed us. This map displays all pubs in Australia because it monitors Satellite receivers to visualize any large outages for Sky Racing Australia.

Map of Australian Pubs by Sol1

So have a nice weekend and keep on hacking.

Dirk Götz

Autor: Dirk Götz

Dirk ist Red Hat Spezialist und arbeitet bei NETWAYS im Bereich Consulting für Icinga, Nagios, Puppet und andere Systems Management Lösungen. Früher war er bei einem Träger der gesetzlichen Rentenversicherung als Senior Administrator beschäftigt und auch für die Ausbildung der Azubis verantwortlich.

Monitoring – it’s all about integration and automation – OSMC 2017 Day 2

OSMC 2017

The second day started with “Monitoring – dos and don’ts” presented by Markus Thiel. Room was already full on the first talk what was not expected when people move from evening event to late lounge and then at 5 o’clock in the morning to the hotel. Event was great great with good food, drinks and chat. But Julia already wrote about that so I will focus on the talks and Markus one was nicely showing “don’ts” I also recognize from my daily work as consultant and helped with tips how to avoid them. He got deeply into details so I can not repeat everything, but just to summarize the biggest problem is always communication between people or systems, perhaps you already knew this from your daily business.

The second talk I attended was Bodo Schulz talking about automated and distributed monitoring of a continuous integration platform. He created his own service discovery named Brain which discovers services and put them into Redis which is then read by Icinga 2 and Grafana for creating configuration. Pinky is his simple stack for visualisation consisting of containers. Both of them are integrated in the platform, one Brain for every pipeline, one Pinky for every team. If you did not get the reference. watch the intro on youtube. His workarounds for features he missed were also quite interesting like implementing his own certificate signing service for Icinga 2 or displaying License data in Grafana. And of course he had a live demo to show all this fancy stuff which was great to see.

Tom was giving the third talk of the day about automated monitoring in heterogeneous environments showing real life scenarios using the Director‘s capabilities. He started with the basics explaining how import, synchronization and jobs work and followed by importing from an old Icinga environment utilizing SQL and the IDO database. In the typical scenario for importing from a CMDB Tom showed typical problems like bad quality of input data and how to workaround with the Director to get a good quality of output. Another scenario explained how to get data from Active Directory for the Windows part of your environment. For VMware users he show the already released vSphere module and also the prototype of the vSphereDB module which adds some more visualization and for AWS users the corresponding module. And the last one showed how to import Excel files using the Fileshipper. And of course he explained how easy it is to create your own import source.

Right after the excellent lunch and the even better event massage Marianne Spiller‘s talk “Ich sehe was, was du nicht siehst (… und das ist CRITICAL!)” (in English “I spy with my little eye something CRITICAL!”) focused on how to get a good monitoring environment with a high user acceptance up and running. Being realistic and show everyone his benefits are the best tips she gave but also she could not provide the one solution that fits all. For more of her tips ranging from technical to organizational I can recommend her blog.

Lennart and Janina Tritschler were talking about distributed Icinga 2 environments automated by Puppet. Really happy to see the talk because Janina adopted Icinga 2 after a fundamentals training I gave about a year ago. They started with a basic introduction of distributed monitoring with Icinga 2 as master, satellite and agent and configuration management with Puppet including exported resources. Afterwards they were diving deeper into the Puppet module for Icinga 2 and how to use it for installation and configuration of the environment. In their demos they included several virtual machines to show how easily this can be done.

In the last break the winner of the gambling at the evening event got his price, a retro game console.

Last but not least I decided for Kevin Honka‘s talk “Icinga 2 + Director, flexible Thresholds with Ansible” in favor of Thomas talking about troubleshooting Icinga2. But I am sure his talk was great as troubleshooting is his daily business as our Lead Support Engineer. Kevin was unhappy with static threshold configured in their Monitoring environment so started to develop a python script to include in his Ansible workflow which modifies thresholds using the Director API. On his roadmap is extending it by creating a Icinga 2 python library usable for others, utilizing this library in a real Ansible module and extending functionality.

Thanks to all speakers, attendees and sponsors leaving today for the great conference, save travels and see you next year on November 5th – 8th for the next OSMC. And of course a nice dinner and happy hacking to all staying for the hackathon tomorrow, I will keep our readers informed on the crazy things we manage to build.

Dirk Götz

Autor: Dirk Götz

Dirk ist Red Hat Spezialist und arbeitet bei NETWAYS im Bereich Consulting für Icinga, Nagios, Puppet und andere Systems Management Lösungen. Früher war er bei einem Träger der gesetzlichen Rentenversicherung als Senior Administrator beschäftigt und auch für die Ausbildung der Azubis verantwortlich.

Monitoring – it’s all about integration and automation – OSMC 2017 Day 1

OSMC 2017
Also for the 12th OSMC we started on Tuesday with a couple of workshops on Icinga, Ansible, Graphing and Elastic which were famous as always and afterwards with meet and greet at the evening dinner. But the real start was as always a warm Welcome from Bernd introducing all the small changes we had this year like having so many great talks we did three in parallel on the first day. Also we had the first time more English talks than German and are getting more international from year to year which is also the reason for me blogging in English.

The first talk of the day I attended was James Shubin talking about “Next Generation Config Mgmt: Monitoring” as he is a great entertainer and mgmt is a really a great tool. Mgmt is primarily a configuration management solution but James managed in his demos to build a bridge to monitoring as mgmt is event driven and very fast. So for example he showed mgmt creating files deleted faster then a user could recognize they are gone. Another demo of mgmt’s reactivity was visualizing the noise in the room, perhaps not the most practical one but showing what you can do with flexible inputs and outputs. In his hysteresis demo he showed mgmt monitoring systemload and scale up and down the number of virtual machines depending on it. James is as always looking for people who join the project and help hacking, so have a look at mgmt (or the recording of one of his talks) and perhaps join what could really be the next generation of configuration management.

Second one was Alba Ferri Fitó talking about community helping her doing monitoring at Vodafone in her talk “With a little help from…the community”. She was showing several use cases e.g. VMware monitoring she changed from passive collection of snmptraps to proactively monitoring the infrastructure with check_vmware_esx. Also she helped to integrate monitoring in the provisioning process with vRealise using the Icinga 2 API, did a corporate theme to get a better acceptance, implemented log monitoring using the sticky option from check_logfiles, created her own scripts to monitor things she was told they could only be monitored by SCOM or using expect for things only having an interactive “API”. It was a great talk sharing knowledge and crediting community for all the code and help.

Carsten Köbke and our Michael were telling “Ops and dev stories: Integrate everything into your monitoring stack”. So Carsten as the developer of the Icinga Web 2 module for Grafana started the talk about his motivation behind and experience gained by developing this module. Afterwards Michael was showing more integration like the Map module placing hosts on an Openstreet map, dashboards, ticket systems, log and event management solutions like Greylog and Elastic including the Icingabeat and an very early prototype (created on the day before) for a module for Graylog.

After lunch which was great as always I attended “Icinga 2 Multi Zone HA Setup using Ansible” by Toshaan Bharvani. He is a self-employed consultant with a history in monitoring starting with Nagios, using Icinga and Shinken for a while and now utilizing Icinga 2 to monitor his costumer’s environments. His ansible playbooks and roles showed a good practical example for how to get such a distributed setup up and running and he also managed to explain it in a way also people not using Ansible at all could understand it.

Afterwards Tobias Kempf as the monitoring admin and Michael Kraus as the consultant supporting him talked about a highly automated monitoring for Europe’s biggest logistic company. They used omd to build a multilevel distributed monitoring environment which uses centralized configuration managed with a custom webinterface, coshsh as configuration generator and git, load distribution with mod_gearman and patch management with Ansible.

Same last talk like every year Bernd (representing the Icinga Team) showed the “Current State of Icinga”. Bernd shortly introduced the project and team members before showing some case studies like Icinga being deployed on the International Space Station. He also promoted the Icinga Camps and our effort to help people to run more Icinga Meetups. Afterwards he started to dive into technical stuff like the new incarnation of Icinga Exchange including full Github sync, the documentation and package repository including numbers of downloads which were a crazy 50000 downloads just for CentOS on one day. Diving even deeper into Icinga itself he showed the new CA Proxy feature allowing multilevel certificate signing and automatic renewal which was sponsored by Volkswagen like some others, too. Some explanation on projects effort on Configuration management and which API to use in the Icinga 2 environment for different use cases followed before hitting the topic logging. For logging Icinga project now provides output for Logstash and Elasticsearch in Icinga 2, the Icingabeat, the Logstash output which could create monitoring objects in Icinga 2 on the fly and last but not least the Elasticsearch module for Icinga Web 2. In his demos he also showed the new improved Icinga Web 2 which adds even more eye candy. Speaking about eye candy also the latest version of Graphite module which will get released soon looks quite nice. Another release pending will be the Icinga Graphite installer using Ansible and Packaging to provide an easy way to setup Graphite. So keep an eye on release blogposts coming next weeks.

It is nice to see topics shift through the years. While the topics automation and integration were quite present in the last years it was main focus of many talks this year. This nicely fits my opinion that you as a software developer should care about APIs to allow easy integration and as an administrator you should provide a single interface I sometimes call “single point of administration”.

Colleagues have collected some pictures for you, if you want to see more follow us or #osmc on Twitter. So enjoy these while I will enjoy the evening event and be back tomorrow to keep you updated on the talks of second day.

This slideshow requires JavaScript.

Dirk Götz

Autor: Dirk Götz

Dirk ist Red Hat Spezialist und arbeitet bei NETWAYS im Bereich Consulting für Icinga, Nagios, Puppet und andere Systems Management Lösungen. Früher war er bei einem Träger der gesetzlichen Rentenversicherung als Senior Administrator beschäftigt und auch für die Ausbildung der Azubis verantwortlich.

Replace spaces with tabs in Visual Studio 2017

Visual Studio has several source code edit settings. This defaults to 4 spaces and no tabs by default and is slightly different to what we use in Icinga 2. There we put focus on tabs in our code style.

Editing the Icinga 2 source code on Windows with Visual Studio requires adjusting the editor settings. Navigate into Tools > Options > Text Editor > C# and C++ and adjust the settings to “Keep tabs”.

I accidentally forgot to specify these settings for C# too, and had the problem that half of the Icinga 2 setup wizard code had 4 spaces instead of tabs. Luckily I’ve found this blog post which sheds some lights in the comments.

Hit Ctrl+H to open the replace search window. Tick the icon to use regular expressions and search for “((\t)*)([ ]{4})”. Add “\t” as replacement text.

Happy coding for Icinga 2 v2.8 – ready for OSMC 🙂

Michael Friedrich

Autor: Michael Friedrich

Michael ist seit vielen Jahren Icinga Developer und hat sich Ende 2012 in das Abenteuer NETWAYS gewagt. Ein Umzug von Wien nach Nürnberg mit der Vorliebe, österreichische Köstlichkeiten zu importieren - so mancher Kollege verzweifelt an den süchtig machenden Dragee-Keksi. Oder schlicht am österreichischen Dialekt der gerne mit Thomas im Büro intensiviert wird ("Jo eh."). Wenn sich Michael mal nicht im Monitoring-Portal helfend meldet, arbeitet er am nächsten LEGO-Projekt oder geniesst das schöne Nürnberg. Oder - at an Icinga Camp near you 😉

Flapping in Icinga 2.8.0

The author viewing the code for the first time

Flapping detection is a feature many monitoring suites offer. It is mainly used to detect unfortunately chosen thresholds, but can also help in detecting network issues or similar. In most cases two thresholds are used, high and low. If the flapping value, which is the percentage of state changes over a set time, gets higher than the high threshold, it is considered flapping. It will then stay flapping until the value drops below the low threshold.

Naturally Icinga 2 had such a feature, just that it implemented a different approach and didn’t work. For 2.8.0 we decided it was time to finally fix flapping, so I went to investigate. As I said the flapping was working differently from Icinga 1, Shinken, etc. Instead of two thresholds there was just one, instead of one flapping value there were two and they change based on the time since the last check. Broken down it looks like this:

positive; //value for state changes
negate; //value for stable changes
FLAPPING_INTERVAL; //Compile time constant to smoothen the values

OnCheckResult() {
  if (positive + negative > FLAPPING_INTERVAL) {
    pct = (positive + negative - FLAPPING_INTERVAL) / FLAPPING_INTERVAL;
    positive -= pct * positive;
    negative -= pct * negative;
  }

  weight = now - timeOfLastCheck;
  if (stateChange)
    positive += weight;
  else
    negative += weight;
}

IsFlapping() {
  return 100 * positive / (negative + positive);
}

The idea was to have the two flapping values (positive & negative) increase one or the other with every checkresult. Positive for state changes and negative for results which were not state changes, by the time since the last check result. The first problem which arises here, while in most cases the check interval is relatively stable, after a prolonged Icinga outage one of the values could be extremely inflated. Another problem is the degradation of the result, in my tests it took 17 consecutive stable results for a flapping value to calm down.

After some tweaking here and there, I decided it would be wisest to go with the old and proven style Icinga 1 was using. Save the last 20 checkresults, count the state changes and divide them by 20. I took inspiration in the way Shinken handles flapping and added weight to the sate changes, with the most recent one having a value of 1.2 and the 20th (oldest) one of 0.8. The issue of possibly wasting memory on saving the state changes could be resolved by using one integer as a bit array. This way we are actually using slightly less memory now \o/

The above example would then have a value of 39.1%, flapping in the case of default thresholds. More details on the usage and calculation of flapping in Icinga 2 can be found in the documentation once version 2.8.0 is released.

Jean-Marcel Flach

Autor: Jean-Marcel Flach

Geboren und aufgewachsen in Bamberg, kam Jean (das "-Marcel" ist still) nach einem Ausflug an die Uni, als Azubi zu NETWAYS. Dort sitzt er seit 2014 im Icinga 2 core Entwicklungsteam.

Trick 42 mit dem Director – Jobs in Reihenfolge

Nachdem wir unseren Trick 17 mit dem Director veröffentlichten, schiebe ich Trick 42 direkt hinterher. Wie im Blogpost von Markus beschrieben, sind Schnittmengen aus mehreren Importquellen eine geniale Lösung um beispielsweise Hosts aus mehreren Quellen mit Informationen anzureichern. Konfiguriert man nun eine Vielzahl solcher Importquellen die für die Schnittmenge dienen sollen, bekommt man evtl. im Ablauf gewisse Probleme mit der Reihenfolge.

Zur genauen Erklärung unser Ausgangsszenario:

  • CMDB 1: Quelle für die Basisdaten des Hosts (Name, IP, FQDN, …)
  • CMDB 2: Quelle für den OS-Type (CentOS, OpenSuSE, Debian, …)
  • CMDB 3: Quelle für den Ansprechpartner (Hr. Müller, Hr. Maier, …)

Damit die Hosts aus CMDB 1 angereichert erstellt (Import + Sync) werden können müssen zuerst CMDB 2 und CMDB 3 abgearbeitet werden. Logisch – wenn der OS-Type und der Ansprechpartner des Hosts dem Director nicht bekannt sind wird es mit Hilfe von Trick 17 auch nicht möglich sein den Host aus CMDB 1 mit Daten anzureichern.

Hauptsächlich fällt dieses Problem beim initialen Import + Sync der Daten auf. Je nachdem wie oft sich eure Importquellen ändern kann dies “gar nicht schlimm” (Hr. Müller ist für den Server 3 Jahre zuständig) oder auch “sehr unglücklich” (Ihr importiert die Kontaktdaten einer ständig wechselnden Rufbereitschaft) sein.

Für den Fall das die Reihenfolge der Importquellen wichtig ist gibt es eine denkbar simple Lösung.

Ihr legt für jeden Import und Sync einen Job im Director an…

 

…und notiert euch jeweils die ID des Director Jobs (die Zahl an der letzten Stelle der URL).

Mit Hilfe dieser ID könnt ihr nun die im Director konfigurierten Jobs von der Kommandozeile aus ausführen. Der Job mit der ID 1 (Import Job für CMDB1) kann mit dem Kommando “icingacli director jobs run 1” gestartet werden.

Am Ende bauen wir uns dazu noch ein kleines Skript:

#!/bin/bash

# set paths and vars
ICINGA_CLI=`which icingacli`
JOB_CMD=${ICINGA_CLI}" director jobs run"

# execute jobs
echo "import and sync..."

echo -e "\tcmdb1"
${JOB_CMD} 1
${JOB_CMD} 2

echo -e "\tcmdb2";
${JOB_CMD} 3
${JOB_CMD} 4

echo -e "\tcmdb3";
${JOB_CMD} 5
${JOB_CMD} 6

Und voilà – wir haben die Imports und Syncs in einer Reihenfolge 🙂

Tobias Redel

Autor: Tobias Redel

Tobias hat nach seiner Ausbildung als Fachinformatiker bei der Deutschen Telekom bei T-Systems gearbeitet. Seit August 2008 ist er bei NETWAYS, wo er in der Consulting-Truppe unsere Kunden in Sachen Open Source, Monitoring und Systems Management unterstützt. Insgeheim führt er jedoch ein Doppelleben als Travel-Hacker, arbeitet an seiner dritten Millionen Euro (aus den ersten beiden ist nix geworden) und versucht die Weltherrschaft an sich zu reißen.