Flapping in Icinga 2.8.0

The author viewing the code for the first time

Flapping detection is a feature many monitoring suites offer. It is mainly used to detect unfortunately chosen thresholds, but can also help in detecting network issues or similar. In most cases two thresholds are used, high and low. If the flapping value, which is the percentage of state changes over a set time, gets higher than the high threshold, it is considered flapping. It will then stay flapping until the value drops below the low threshold.

Naturally Icinga 2 had such a feature, just that it implemented a different approach and didn’t work. For 2.8.0 we decided it was time to finally fix flapping, so I went to investigate. As I said the flapping was working differently from Icinga 1, Shinken, etc. Instead of two thresholds there was just one, instead of one flapping value there were two and they change based on the time since the last check. Broken down it looks like this:

positive; //value for state changes
negate; //value for stable changes
FLAPPING_INTERVAL; //Compile time constant to smoothen the values

OnCheckResult() {
  if (positive + negative > FLAPPING_INTERVAL) {
    pct = (positive + negative - FLAPPING_INTERVAL) / FLAPPING_INTERVAL;
    positive -= pct * positive;
    negative -= pct * negative;
  }

  weight = now - timeOfLastCheck;
  if (stateChange)
    positive += weight;
  else
    negative += weight;
}

IsFlapping() {
  return 100 * positive / (negative + positive);
}

The idea was to have the two flapping values (positive & negative) increase one or the other with every checkresult. Positive for state changes and negative for results which were not state changes, by the time since the last check result. The first problem which arises here, while in most cases the check interval is relatively stable, after a prolonged Icinga outage one of the values could be extremely inflated. Another problem is the degradation of the result, in my tests it took 17 consecutive stable results for a flapping value to calm down.

After some tweaking here and there, I decided it would be wisest to go with the old and proven style Icinga 1 was using. Save the last 20 checkresults, count the state changes and divide them by 20. I took inspiration in the way Shinken handles flapping and added weight to the sate changes, with the most recent one having a value of 1.2 and the 20th (oldest) one of 0.8. The issue of possibly wasting memory on saving the state changes could be resolved by using one integer as a bit array. This way we are actually using slightly less memory now \o/

The above example would then have a value of 39.1%, flapping in the case of default thresholds. More details on the usage and calculation of flapping in Icinga 2 can be found in the documentation once version 2.8.0 is released.

Jean-Marcel Flach

Autor: Jean-Marcel Flach

Geboren und aufgewachsen in Bamberg, kam Jean (das “-Marcel” ist still) nach einem Ausflug an die Uni, als Azubi zu NETWAYS. Dort sitzt er seit 2014 im Icinga 2 core Entwicklungsteam.

A naturally grown .vimrc

Vim is pretty great, honestly. But since I started using vim after growing tired of nano, a lot in my .vimrc changed… To the point where colleagues who use vim on their machine rather use nano on mine before trying to wrap their head around my workflow. Every .vimrc is different and a lot of them exist out there, over twelve thousand repositories of shared vim configurations on GitHub alone and many more on private laptops and computers.

Generally creating a .vimrc works like creating a Makefile: Copy and paste from different sources with small changes until you have an approximation of the desired result and then need to decide whether you want to go the extra mile of dealing with your configurations’ and Vims’ quirks to get it working properly or you just leave it be. This is followed by years of incremental tweaking until you have a Vim environment which works perfectly but you don’t know why. At least that’s how I do it ¯\_(ツ)_/¯

So I took a dive into my own to see what kind of madness lurks down there…

set nocompatible
set t_Co=16
set shiftwidth=4
set tabstop=4
set autoindent
set smartindent
filetype plugin on
filetype indent on

So far, so good. Nothing special. Other than that indentation is set and overwritten three times, this has not lead to any problems (yet). The next lines are a bit more interesting:

call pathogen#infect()
call pathogen#helptags()
syntax on
set background=dark " dark | light "
colorscheme solarized

 

pathogen is a runtime manipulator which allows you to add additional plugins to Vim easier, in my case these are vim-solarized, a popular colourscheme, and vim-fugitive, a plugin that adds git commands within Vim. #infect() loads these plugins and #helptags() generates documentation. The following lines make use of solarized and add syntax highlighting.

set nu
set listchars=eol:$,tab:>-,trail:.,extends:>,precedes:<,nbsp:_
set list
let &colorcolumn="121"
set splitbelow
set splitright
set viminfo='20,<1000

This controls numbering and control characters, colorcolumn adds a ugly line over the 121’st char to keep up with coding styles. split tells vim where to add new views when splitting (I rarely use these) and viminfo sets the copy buffer to up to 1000 lines.
Now we get to the really interesting bits, key remaps!

Artist rendition of the authors .vimrc

"See :help map-commands
"n        = Normal only
"v        = Visual only
"i        = Insert only
"         = Normal+Insert+Select only
" nore    = disable recusiveness
"     map = Recursive map

"split window switching
nnoremap <C-J> <C-W><C-J>
nnoremap <C-K> <C-W><C-K>
nnoremap <C-L> <C-W><C-L>
nnoremap <C-H> <C-W><C-H>

"Make Up and Down Arrows move half a screen
"instead of 1 line
noremap <Up> <C-U>
noremap <Down> <C-D>

"BUFFERS Used to be F6,F7. Now used by flake
set hidden
noremap :bprevious
noremap :bnext

"Because buffers change Flake to use F3 instead of F7
"autocmd FileType python map :call Flake8()
" DOES NOT WORK DAMN

"Smart Home key
noremap <expr> <Home> (col('.') == matchend(getline('.'), '^\s*')+1 ? '0' : '^')
imap <Home> <C-o><Home>

Switching windows can be hassle without the first four remaps, but again, I rarely use split windows. Mapping the up and down arrows to jump half a screen instead just one line I added to stop myself from using them instead of j and k, I have grown quite used to it, so it gets to stay ^_^
Buffer remaps are a must! Because I do use them a lot. There have been problems with Flake, a python lint tool, which I tried to avoid by remapping flakes key… Didn’t work out, so I got rid of Flake and call pylint when required instead.

The smart home key makes the <home>-key jump to the first non-whitespace char in a line instead of the begining of the line, quite handy!

"Access commandwindow with ctrl+f or :cedit

"For dorks
map q: :q
command Wsudo w !sudo tee % > /dev/null
"Jump to next line > 120 chars
command Warnl :/\%>120v./+
"For easy copy-paste
command Cpm :set nonu paste cc=  nolist
"Catch regular caps failure
command WQ :wq

These are just a few quality of life improvements, I tend to mistype :q as q: and :wq as :WQ. The :Wsudo command lets me edit read-only files and :Cpm is for easy mouse copy and paste. :Warnl jumps to the next line with more than 120 chars, this again is to check for style problems.

Alright, that’s it. My current .vimrc. There were a few commented out lines I omitted because I have no clue what I was thinking at the time anymore, but I hope there were a few little bits someone else might find useful to feed their own beast of a .vimrc with.

Image sources:
Vim logo from vim.sexy
Shoggoth by twitter.com/nottsuo

Jean-Marcel Flach

Autor: Jean-Marcel Flach

Geboren und aufgewachsen in Bamberg, kam Jean (das “-Marcel” ist still) nach einem Ausflug an die Uni, als Azubi zu NETWAYS. Dort sitzt er seit 2014 im Icinga 2 core Entwicklungsteam.

Redis vs MongoDB as message queue and config proxy

Benchmarks and comparisons for these two NoSQL databases exist a plenty. But in all the blogs and whitepapers I found the use-case was quite different from ours, we were trying to find a tool to queue Icinga 2 events and speed up config updates. The queue would use Redis Pub/Sub or MongoDBs capped collections + tailable cursors. From an implementation standpoint MongoDB looks better, it offers good libraries for most programming languages and the document-based approach allows for filtering based on individual attributes while values in Redis are just opaque blobs. Redis Pub/Sub model also has the problem that it loses its queued items when restarted.

From these naive benchmarks (PHP scripts run with time), it looks like MongoDB likes to take their time with creating objects.

Inserting 250.000 objects:

  • Redis: 11.1s
  • MongoDB: InsertOne: 53.3s
  • MongoDB: InsertMany: 25.4s

Inserting 250.000 objects into list capped to 50.000

  • Redis: 12s
  • MongoDB: InsertOne: 57.6s

Redis speed advantage comes from keeping created objects in memory, while MongoDB writes them to the hard drive right away. This is a huge advantage for Redis, as we rotate our events from the queue, writing them to disk would be a waste of io time. But sadly MongoDB does not offer the possibility of keeping the data volatile in memory.

Another concern of ours is security, Redis does not offer any security features and for many application this is not a problem, but in our case, this means we have to finds ways to make it secure. There have been many attempts to do so, e.g using tools like stunnel, but it’s still an additional topic that would need to be tackled. MongoDB on the other hand offers ACLs on document level and native SSL, which is great! Yet is has been rightfully criticized for its insecure default configurations, which resulted in a flood of hacked MongoDB instances earlier this year.

Conclusion: We will have to investigate further. Especially integration and security will need a closer look. MongoDBs performance seems lackluster, but may just be enough as our tests may have been much bigger than what we will be met with in a real setup.

Jean-Marcel Flach

Autor: Jean-Marcel Flach

Geboren und aufgewachsen in Bamberg, kam Jean (das “-Marcel” ist still) nach einem Ausflug an die Uni, als Azubi zu NETWAYS. Dort sitzt er seit 2014 im Icinga 2 core Entwicklungsteam.

Monitoring Powershell scripts with Icinga 2


The need to to monitor arbitrary Powershell scripts comes up now and then and often there are some workarounds or alternatives, NSClient for example, named. However in order to have something I can link refer people to when the topic comes up again, I’ll try to provide a quick and simple to adapt solution. Keep in mind that this assumes you have Icinga 2 up and running on your Windows host, Powershell installed and are reasonably sane.

First the the check script I used for demonstration purposes in this case, all it does is check whether a process is running and returning OK or CRITICAL based on that.

if ($args.Length -lt 1) {
	Write-Output "Script requires one argument (Process)"
	Exit(3)
}

$proc=$args[0]

$state = Get-Process $proc -ErrorAction SilentlyContinue
if ($state) {
	Write-Output "PROCESS OK '$proc' is running"
	Exit(0)
} else {
	Write-Output "PROCESS CRITICAL '$proc' is not running"
	Exit(2)
}

Safe it as check_proccess.ps1 somewhere you can find it again. In this case I put next to the other check plugins.

The following are the check_command object and Service apply. And as it turns out it’s not that easy of a task as I thought, it’s mostly Windows fault really… Getting the exit code of a script from Powershell returned to icinga2 required some trickery (credit goes to NSClient for that one). The result is a bit of weird CheckCommand, which can and should be improved.

object CheckCommand "powershell" {
    import "plugin-check-command"
    command = [ "cmd" ]
    arguments = { 
        "Weird command" = { 
            value = "/c echo $powershell_script$ $powershell_args$ ; exit ($$lastexitcode) | powershell.exe -command -"
            description = "This is needed because powershell would not tell us the exit code otherwise"
            skip_key = true
        }
    }   
}

apply Service "check_powerpoint" {
    import "generic-service"
    check_command = "powershell"
    vars.powershell_script = PluginDir + "\check_process.ps1"
    vars.powershell_args = "POWERPNT"
    assign where host.vars.os == "Windows"
}
Jean-Marcel Flach

Autor: Jean-Marcel Flach

Geboren und aufgewachsen in Bamberg, kam Jean (das “-Marcel” ist still) nach einem Ausflug an die Uni, als Azubi zu NETWAYS. Dort sitzt er seit 2014 im Icinga 2 core Entwicklungsteam.

Der Icinga Buildserver, Version 3

Letztes verbliebene Bild des alten Icinga Jenkins

Der Icinga-Buildserver, erreichbar unter https://build.icinga.com, läuft in dieser Form jetzt etwa ein Jahr. Doch gibt es noch immer ein paar Probleme die mit diesem Setup bestehen: So ist das Hinzufügen neuer Jobs noch etwas umständlich, das provisionieren dauert länger als uns lieb ist und besonders übersichtlich ist die Konfiguration auch nicht. Um diese Probleme anzugehen haben wir uns noch einmal mit Puppet und Jenkins auseinandergesetzt.

Wie vorher verwenden wir ein jenkins puppet-modul, nur diesmal haben wir es mit einem speziellen icinga-jenkins Modul erweitert. Dieses Modul erlaubt es uns spezielle pipeline-jobs mit geringem Konfigurationsaufwand zu erstellen. So ist das unterstehende Beispiel alles was zu konfigurieren ist um eine komplette Pipeline zu erstellen. Selbst der spezielle Umgang mir RPM und Deb ist zu großen Teilen vereinheitlicht und funktioniert für alle Projekte gleich.

icinga_build::pipeline:
  icinga2-snapshot:
    control_repo: https://github.com/Icinga/icinga-packaging.git
    control_branch: snapshot
    matrix_deb:
      'debian-jessie': {}

Die Pipeline erstellt dabei nicht nur einen, sondern gleich vier Jobs: “source”, “binary”, “test” und “publish”. Diese verarbeiten die specfiles, bauen das Paket, testen es und veröffentlichen es auf https://packages.icinga.com

In Produktion ist unser Modul noch nicht, aber um den testen und konfigurieren zu können haben wir Vagrant Boxen gebaut. Mit Hilfe derer bauen wir zur Zeit das icinga-jenkins Modul weiter aus um den bestehenden Buildserver komplett mit den neuen Pipelinejobs abbilden zu können. Wir hoffen unseren Buildprozess damit noch einfacher für Entwickler zu machen und dank der neuen Testsphase für Pakete Problemen in Zukunft besser vorzubeugen zu können

Jean-Marcel Flach

Autor: Jean-Marcel Flach

Geboren und aufgewachsen in Bamberg, kam Jean (das “-Marcel” ist still) nach einem Ausflug an die Uni, als Azubi zu NETWAYS. Dort sitzt er seit 2014 im Icinga 2 core Entwicklungsteam.