Alerting on JIRA Problems using Nagios

So I ran into an interesting situation this past Monday. Apparently my Primary DNS had been down for at least a week. I went to go look at my network monitoring tool (LibreNMS) – and THAT was down too – for what I can guess is at least two weeks! Granted I haven’t been doing as much on my Homelab since early March when I went into the hospital, this was still not a good state of affairs.

So I decided to stand up a Nagios instance to monitor and alert when I have critical systems down. After getting it stood up, it didn’t take me long to start thinking about how I could use this with JIRA, which is now the topic we are going to cover today!

A bit of history

As you know, when I started my Atlassian journey, I was in charge of more than just JIRA. Nagios was one of my boxes I inherited as well. So I’m somewhat familiar with the tool already and how to configure it. I’ve had to modify things during that time, but never do a full setup. However, I knew I wanted to do more than monitor if JIRA was listening to web-traffic. So as part of the whole installation, I decided to dive in and see what she can do.

How to select what to Alert on.

Selecting what I want to be alerted for has always been a balancing act for me. You don’t want to have so many emails that they become worthless, but you don’t want to have so few that you won’t be alerted to a real problem.  

The goal of alerting is to clue you into problems so you can be proactive. Fix back end problems before they become a user ticket. So I always try to take the approach “What does a user care about?”

They care that the system is up and accessible, so I always monitor the service ports, including my access port. So that’s three.

A user also cares that their integrations work. If your integrations depend on SSL, and your clock drifts too far out of alignment, those integrations can fail – so I want to check the system is in sync with the NTP Server.

A feature that users love is the ability to attach files to issues. This feature will eventually chew up your disk space, so I’ll also want to monitor the disk JIRA’s home directory lives on. 

Considering I’m using a proxy, I’ll want to be sure the JVM itself is up, so I’ll need to look at that. I’ll also want to be sure that JIRA is performing at it’s best, and isn’t taking too long to respond, so I’ll want an alert for that as well.

Do you see what I’m doing? I’m looking at what can go wrong with JIRA when I’m not looking and setting up alerts for those. The idea here is I care about what my users care about, so I want the Nagios to tell me what is wrong before my users get a chance to.

So…configurations.  

Now comes the fun part. Nagios’ configuration files is a bit much to take in at first. However, I will be isolating the Atlassian specific configurations to make things a bit easier on all of us. First, lets start with some new commands I had to add.

###############################################################################
# atlassian_commands.cfg
#
#
# NOTES: This config file provides you with some commands tailored to monitoring
#        JIRA nodes from Nagios
# AUTHOR: Rodney Nissen <rnissen@thejiraguy.com
#
###############################################################################


define command {
    command_name    check_jira_status
    command_line	$USER1$/check_http -S -H $HOSTADDRESS$ -u /status -s '{"state":"RUNNING"}'
	}
	
define command {
    command_name	check_jira_restapi
	command_line	$USER1$/check_http -S -H $HOSTADDRESS$ -u /rest/api/latest/issue/$ARG3$ -s "$ARG3$" -k 'Authorization: Basic $ARG4$' -w $ARG1$ -c $ARG2$
	}
    
define command {
    command_name    check_jira_disk
    command_line    $USER1$/check_by_ssh -H $HOSTADDRESS$ -l nagios -C "/usr/lib64/nagios/plugins/check_disk -w $ARG2$ -c $ARG3$ -p $ARG1$"
    }
    
define command {
    command_name    check_jira_load
    command_line    $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/usr/lib64/nagios/plugins/check_load -w $ARG1$ -c $ARG2$" -l nagios

The first two commands here are VERY tailored to JIRA. The first one checks that the JVM is running, all with a handy HTTP request. If you go to your JIRA instance and go to the /status directory, the JVM will respond with a simple JSON telling you the state of the node. You use this feature in JIRA Data Center, so your load balancer can determine which nodes are up and ready for traffic. Buuut…it’s on JIRA Server too, and we can use it for active monitoring. So I did. If JIRA returns anything other than {“state”:”RUNNING”}, the check will fail and you will get an alert.

The second is a check on the rest API. This one will exercise your JIRA instance to make sure it’s working without too much load time for users. The idea here will search for a known issue key, and see if it returns valid information within a reasonable time. $ARG1$ is how long JIRA has before Nagios will issue a warning that it’s too slow (in seconds), and $ARG2$ is how long JIRA has before Nagios considers it a critical problem. $ARG3$ is your known good Issuekey. $ARG4$ is a set of credentials for JIRA encoded in Base64. If you are not comfortable just leaving your actual credentials encoded as such, I’d suggest you check out the API Token Authentication App for JIRA. Using the App will allow you to use a token for authentication and not expose your password.

The third commands here are for checking JIRA’s home directory ($ARG1). $ARG2$ and $ARG3$ are percentages for the warning and critical thresholds, respectively.

The fourth is for checking the system load. This one is relatively straight forward. $ARG1$ is the system load that will trigger a warning, and $ARG2$ is the system load that shows you have a problem.

Now for the JIRA host configuration:

###############################################################################
# jira.CFG - SAMPLE OBJECT CONFIG FILE FOR MONITORING THIS MACHINE
#
#
# NOTE: This config file is intended to serve as an *extremely* simple
#       example of how you can create configuration entries to monitor
#       the local (Linux) machine.
#
###############################################################################



###############################################################################
#
# HOST DEFINITION
#
###############################################################################

# Define a host for the local machine

define host {

    use                     linux-server            ; Name of host template to use
                                                    ; This host definition will inherit all variables that are defined
                                                    ; in (or inherited by) the linux-server host template definition.
    host_name               jira
    alias                   JIRA
    address                 192.168.XXX.XXX
}


###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################

# Define a service to "ping" the local machine

define service {

    use                     generic-service           ; Name of service template to use
    host_name               jira
    service_description     PING
    check_command           check_ping!100.0,20%!500.0,60%
}


# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.

define service {

    use                     generic-service           ; Name of service template to use
    host_name               jira
    service_description     SSH
    check_command           check_ssh
}



# Define a service to check HTTP on the local machine.
# Disable notifications for this service by default, as not all users may have HTTP enabled.

define service {

    use                     generic-service           ; Name of service template to use
    host_name               jira
    service_description     HTTP
    check_command           check_http
}

define service {

    use                     generic-service
    host_name               jira.folden-nissen.com
    service_description     HTTPS
    check_command           check_https
}


define service {
    use                     generic-service
    host_name               jira
    service_description     NTP
    check_command           check_ntp!0.5!1
}

define service {
	use				generic-service
	host_name			jira
	service_description		JIRA Status
	check_command			check_jira_status
	}
	
define service {
	use				generic-service
	host_name			jira
	service_description		JIRA API Time
	check_command			check_jira_restapi!2!3!HL-1!<Base64 Credentials>
	}
    
define service {
	use				generic-service
	host_name			jira
	service_description		JIRA System Load
	check_command			check_jira_load!5.0,4.0,3.0!10.0,6.0,4.0
	}
    
define service {
	use				generic-service
	host_name			jira
	service_description		JIRA Home Directory Free Space
	check_command			check_jira_disk!<JIRA Home>!75%!85%
	}

So, first, we define the host. This section is information specific to JIRA. Then we start setting services for JIRA. Within Nagios, a Service is a particular check you want to run.

The next four options are pretty standard. These are checking Ping, the two service ports (HTTP and HTTPS), and the SSH port. The SSH Port and HTTP/S port checks will also check that those services are responding as expected.

The next check is for NTP. I have this setup to warn me if the clock is a half-second off and give me a critical error if the clock is off by one second. These settings might be too strict, but it has yet to alert, so I think I have dialed it in well enough.

The next is the JIRA Status check. This service will check /status, as we mentioned earlier. It’s either the string we are expecting, or it’s not, so no arguments needed.

After that is my JIRA API Check, which I set up to check the HL-1 issue. If the API Call takes longer to 2 seconds, issue a warning, and if it takes longer than 3 seconds, Nagios issues a critical problem. This alert won’t tell me exactly what’s wrong, but it will tell me if there is a problem anywhere in the system, so I think it’s a good check.

The last two services are systems check – checking the System Load and JIRA home directory disk, respectively. The Load I haven’t had a chance to dial in yet, so I might have it set too high, but I’m going to leave it for now. As for the Disk check, I like to have plenty of warning I am approaching a full disk to give me time to resolve it, so these numbers are good.

The last step is to add these to the nagios.cfg file so that they get loaded into memory. However, this is as easy as adding the following lines into the cfg file.

# Definitions for JIRA Monitoring
# Commands:
cfg_file=/usr/local/nagios/etc/objects/atlassian_command.cfg

# JIRA Nodes:
cfg_file=/usr/local/nagios/etc/objects/jira.cfg

And that’s it! Restart Nagios and you will see your new host and service checks come up!

Nagios in action.

So I’ve had this configuration in place for about a day now, and it appears to be working. The API Time check did go off once, but I did restart the JIRA Server to adjust some specs on the VM, so I expected the delay. So I hope this helps you as you are setting up alerts for your JIRA system!

And that’s it for this week!

We did get a bit of bad news about Summit 2021 last week. Out of an abundance of caution, Atlassian decided to go ahead and make all in-person events of 2020 and Summit 2021 virtual events. However, they have almost a year to prepare for a virtual Summit – as opposed to the 28 days they had this year. So I am excited to see what ideas Atlassian has to make this a fantastic event!

Don’t forget our poll! I’m going to let it run another week!

Don’t forget you can check me out on Twitter! I’ll be posting news, events, and thoughts there, and would love to interact with everyone! If you found this article helpful or insightful, please leave a comment and let me know! A comment and like on this post in LinkedIn will also help spread the word and help others discover the blog! Also, If you like this content and would like it delivered directly to your inbox, sign up below!

But until next time, my name is Rodney, asking “Have you updated your JIRA Issues today?”

Monitoring JIRA for Fun and Health

So, dear readers, here’s the deal. Some weeks, when I sit down to write, I know exactly what I’m going to write about, and can get right to it. Other weeks, I’m sitting down, and I don’t have a clue. I can usually figure something out, but it’s very much a struggle. This week is VERY much the latter.

Compound that with the fact that I just lost most of my VM’s due to a storage failure I had this very morning. Part of it was a mistake on my part. I have the home lab so that I can learn things I can’t learn on the job. And mistakes are a painful but powerful way to learn. Still….

This brings me back to a conversation I had with a colleague and fellow Atlassian Administrator for a company I used to work for. He had asked me what my thoughts around implementing Monitoring of JIRA. Well, I have touched on the subject before, but if I’m being honest, this isn’t my greatest work. Combine that with the fact that I suddenly need to rebuild EVERYTHING, well, why not start with my monitoring stack!

So, we are going to be setting up a number of systems. To gather system stats, that is to say CPU usage, Memory Usage, and Disk usage, we are going to be using Telegraf, which will be storing that data in an InfluxDB database. Then for JIRA stats we are going to use Prometheus. And to query and display this information, we will be using Grafana.

The Setup

So we are going to be setting up a new system that will live alongside our JIRA instance. We will call it Grafana, as that will be the front end we will interact with the system with.

On the back end it will be running both a InfluxDB Server and a Prometheus Server. Grafana will use both InfluxDB and Prometheus as data sources, and will use that to generate stats and graphs of all the relevant information.

Our system will be a CentOS 7 system (my favorite currently), and will have the following stats:

  • 2 vCPU
  • 4 GB RAM
  • 16 GB Root HDD for OS
  • 50 GB Secondary HDD for Services

This will give us the ability to scale up the capacity for services to store files without too much impact on the overall system, as well as monitor it’s size as well.

As per normal, I am going to write all commands out assuming you are root. If you are not, I’m also assuming you know what sudo is and how to use it, so I won’t insult you by holding your hand with that.

InfluxDB

Lets get started with InfluxDB. First thing we’ll need to do is add the yum repo from Influxdata onto the system. This will allow us to use yum to do the heavy lifting in the install of this service.

So lets open /etc/yum.repos.d/influxdb.repo

vim /etc/yum.repos.d/influxdb.repo

And add the following to it:

[influxdb]
name = InfluxDB Repository - RHEL \$releasever
baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key

Now we can install InfluxDB

yum install influxdb -y

And really, that’s it for the install. Kind of wish Atlassian did this kind of thing.

We’ll need to of course allow firewall access to Telegraf can get data into InfluxDB.

firewall-cmd --permanent --zone=public --add-port=8086/tcp
firewall-cmd --reload

And with that we’ll start and enable the service so that we can actually do the service setup.

systemctl start influxdb
systemctl enable influxdb

Now we need to set some credentials. As initially setup, the system isn’t really all that secure. So we are going to secure it initially by using curl to set ourselves an account.

curl -XPOST "http://localhost:8086/query" --data-urlencode \
"q=CREATE USER username WITH PASSWORD 'strongpassword' WITH ALL PRIVILEGES"

I shouldn’t have to say this, but you should replace username with one you can remember and strongpassword with, well, a strong password.

Now we can use the command “influx” to get into InfluxDB and do any further set up we need.

influx -username 'username' -password 'password'

Now that we are in, we need to setup a database and user for our JIRA data to go into. As a rule of thumb, I like to have one DB per application and/or system I intend to monitor with InfluxDB.

CREATE DATABASE Jira
CREATE USER jira WITH PASSWORD 'strongpassword'
GRANT ALL ON jira TO jira
CREATE RETENTION POLICY one_year ON Jira DURATION 365d REPLICATION 1 DEFAULT
SHOW RETENTION POLICIES ON Jira

And that’s it, InfluxDB is ready to go!

Grafana

Now that we have at least one datasource, we can get to setting up the Front End. Unfortunately, we’ll need information from JIRA in order to setup Prometheus (once we’ve set JIRA up to use the Prometheus Exporter), so that data source will need to wait.

Fortunately, Grafana can also be setup using a Yum repo. So lets open up /etc/yum.repos.d/grafana.repo

vim /etc/yum.repos.d/grafana.repo

and add the following:

[grafana]
name=grafana
baseurl=https://packages.grafana.com/oss/rpm
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://packages.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt

Afterwards, we just run the yum install command:

sudo yum install grafana -y

Grafana defaults to port 3000, however options to change or proxy this are available. However, we will need to open port 3000 on the firewall.

firewall-cmd --permanent --zone=public --add-port=3000/tcp
firewall-cmd --reload

Then we start and enable it:

sudo systemctl start grafana-server
sudo systemctl enable grafana-server

Go to port 3000 of the system on your web browser and you should see it up and running. We’ll hold off on setting up everything else on Grafana until we finish the system setup, though.

Telegraf

Telegraf is the tool we will use to get our data from JIRA’s underlying linux system and into InfluxDB. This is actually part of the same YUM repo that InfluxDB is installed from, so we’ll now also add it to the JIRA server – same as we did Grafana.

vim /etc/yum.repos.d/influxdb.repo

And add the following to it:

[influxdb]
name = InfluxDB Repository - RHEL \$releasever
baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key

And now that it has the YUM repo, we’ll install telegraf onto the JIRA Server.

yum install telegraf -y

Now that we have it installed, we can take a look at it’s configuration, which you can find in /etc/telegraf/telegraf.conf. I highly suggest you take a backup of this file first. Here is an example of a config file where I’ve filtered out all the comments and added back in everything essential.

[global_tags]
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  logtarget = "file"
  logfile = "/var/log/telegraf/telegraf.log"
  logfile_rotation_interval = "1d"
  logfile_rotation_max_size = "500MB"
  logfile_rotation_max_archives = 3
  hostname = "<JIRA's Hostname>"
  omit_hostname = false
[[outputs.influxdb]]
  urls = ["http://<grafana's url>:8086"]
  database = "Jira"
  username = "jira"
  password = "<password from InfluxDB JIRA Database setup>"
[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]

And that should be it for config. There are of course more we can capture using various plugins – based on whatever we are interested in, but this will get the bare minimum we are interested in.

Because telegraf is pushing data to the InfluxDB server, we don’t need to open any firewall ports for this, which means we can start it, then monitor the logs to make sure it is sending the data over without any problems.

systemctl start telegraf
systemctl enable telegraf
tail -f /var/log/telegraf/telegraf.log

And assuming you don’t see any errors here, you are good to go! We will have the stats waiting for us when you finish the setup of Grafana. But first….

Prometheus Exporter

So telegraf is great for getting the Linux system stats, but that only gives us a partial picture. We can train it to capture JMX info, but that means we have to setup JMX – something I’m keen to avoid whenever possible. So what options have we got to capture details like JIRA usage, JAVA Heap performance, etc?

Ladies and gentlemen, the Prometheus Exporter!

That’s right, as of the time of this writing, this is yet another free app! This will setup a special page that Prometheus can go to and “scrape” the data from. This is what will take our monitoring from “okay” to “Woah”.

Because it is a free app, we can install it directly from the “Manage Apps” section of the JIRA Administration console

Once you click install, click “Accept & Install” on the pop up, and it’s done! After a refresh, you should notice a new sidebar item called “Prometheus Exporter Settings”. Click that, then click “Generate” next to the token field.

Next we’ll need to open the “here” link into a new tab on the “Exposed metrics are here” text. Take special special note of the URL used, as we’ll need this to setup Prometheus.

Prometheus

Now we’ll go back to our Grafana system to setup Prometheus. To find the download, we’ll go to the Prometheus Download Page, and find the latest Linux 64 bit version.

Try to avoid “Pre-release”

Copy that to your clipboard, then download it to your Grafana system.

 wget https://github.com/prometheus/prometheus/releases/download/v2.15.2/prometheus-2.15.2.linux-amd64.tar.gz

Next we’ll need to unpack it and move it into it’s proper place.

tar -xzvf prometheus-2.15.2.linux-amd64.tar.gz
mv prometheus-2.15.2.linux-amd64 /archive/prometheus

Now if we go into the prometheus folder, we will see a normal assortment of files, but the one we are interested in is prometheus.yml. This is our config file and where we are interested in working. As always, take a backup of the original file, then open it with:

vim /archive/prometheus/prometheus.yml

Here we will be adding a new “job” to the bottom of the config. You can copy this config and modify it for your purposes. Note we are using the URL we got from the Prometheus Exporter. The first part of the URL (everything up to the first slash, or the FQDN) goes under target where indicated. The rest of the URL (folder path) goes under metrics_path. And then your token goes where indicated so that you can secure these metrics.

global:
  scrape_interval:     15s
  evaluation_interval: 15s
alerting:
  alertmanagers:
  - static_configs:
    - targets:
rule_files:
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'Jira'
    scheme: https
    metrics_path: '<everything after the slash>'
    params:
      token: ['<token from Prometheus exporter']
    static_configs:
    - targets:
      - <first part of JIRA URL, everything before the first '/'>

We’ll need to now open up the firewall port for Prometheus

firewall-cmd --permanent --zone=public --add-port=9090/tcp
firewall-cmd --reload

Now we can test Prometheus. from the prometheus folder, run the following command.

./prometheus --config.file=prometheus.yml

From here we can open a web browser, and point it to our Grafana server on port 9090. On the Menu, we can go to Status -> Targets and see that both the local monitoring and JIRA are online.

Go ahead and stop prometheus for now by hitting “Ctrl + C”. We’ll need to set this up as a service so that we can rely on it coming up on it’s own should we ever have to restart the Grafana server.

Start by creating a unique user for this service. We’ll be using the options “–no-create-home” and “–shell /bin/false” to tell linux this is an account that shouldn’t be allowed to login to the server.

useradd --no-create-home --shell /bin/false prometheus

Now we’ll change the files to be owned by this new prometheus account. Note that the -R makes chown run recursively, meaning it will change it for every file underneath were we run it. Stop and make sure you are running it from the correct directory. If you run this command from the root directory, you will have a bad day (Trust me)!

chown -R prometheus:prometheus ./

And now we can create it’s service file.

vim /etc/systemd/system/prometheus.service

Inside the file we’ll place the following:

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/archive/prometheus/prometheus \
    --config.file /archive/prometheus/prometheus.yml \
    --storage.tsdb.path /archive/prometheus/ \
    --web.console.templates=/archive/prometheus/consoles \
    --web.console.libraries=/archive/prometheus/console_libraries 

[Install]
WantedBy=multi-user.target

After you save this file, type the following commands to reload systemctl, start the service, make sure it’s running, then enable it for launch on boot.:

systemctl daemon-reload
systemctl start prometheus
systemctl status prometheus
systemctl enable prometheus

Now just double check that the service is in fact running, and you’re good to go!

Grafana, the Reckoning

Now that we have both our datasources up and gathering information, we need to start by creating a way to display it. On your web browser, go back to Grafana, port 3000. You should be greeted with the same login screen as before. To login the first time, use ‘admin’ as username and password.

You will be prompted immediately to change this password. Do so. No – really.

After you change your password, you should see the screen below. Clilck “Add data source”

We’ll select InfluxDB from the list as our first Data Source.

For settings, we’ll enter only the following:

  • Name: JIRA
  • URL: http://localhost:8086
  • Database: Jira
  • User: jira
  • Password: Whatever you set the InfluxDB Jira password to be

Click “Save & Test” at the bottom and you should have the first one down. Now click “Back” so we can set up Prometheus.

On Prometheus, all we’ll need to do is set the URL to be “http://localhost:9090. Enter that, then click “Save & Test”. And that’s both Data Sources done! Now we can move onto the Dashboard. On the right sidebar, click through to “Home”, then click “New Dashboard”

And now you are ready to start visualizing Data. I’ve already covered some Dashboard tricks in my previous attempt at this topic. However, if it helps, here’s how I used Prometheus to setup a graph of the JVM Heap.

Some Notes

Now, there is some cleanup you can do here. You can map out the storage for Grafana and InfluxDB to go to your /archive drive, for example. However, I can’t be giving away *ALL* the secrets ;). I want to challenge you there to see if you can learn to do it yourself.

We do have a few scaling options here too. For one, we can split Influx, Prometheus, and Grafana onto their own systems. However, my experience has been that this isn’t usually necessary, and they can all live comfortably on one system.

And one final note. The Prometheus exporter, strictly speaking, isn’t JIRA Data Center compatible. It will run however. As best I can tell, it will give you the stats for each node where applicable, and the overall stats where that makes sense. It might be worth installing and setting up Prometheus to bypass the load balancer and do each node individually.

But seriously, that’s it?

Indeed it is! This one is probably one of my longer posts, so thank you for making it to the end. It’s been a great week hearing how the blog is helping people out in their work, so keep it up! I’ll do my part here to keep providing you content.

On that note, this post was a reader-requested topic. I’m always happy to take on a challenge from readers, so if you have something you’d like to hear about, let me know!

One thing that I’m working on is to try and make it easier for you to be notified about new blog posts. As such, I’ve included an email subscription form at the bottom of the blog. If you want to be notified automatically about to blog posts, enter your email and hit subscribe!

And don’t forget about the Atlassian Discord chat – thoroughly unofficial. click here to join: https://discord.gg/mXuRsVu

But until next time, my name is Rodney, asking “Have you updated your JIRA issues today?”

Advanced Monitoring of JIRA

Note from the future: I’ve actually written a much better article on this you can find here: Monitoring JIRA for Health and Fun. You will find much better information there. I’m leaving this up as a testament to the past, but figured I’d save you the work.

So, confession time. I am a sucker for data. I love seeing real time graphs of system stats, performance, events, etc.

And this is just Minecraft….

So when I saw the following blog post being shared on reddit, it was a no brainer.

Jira active monitoring 24/7 with Prometheus + Grafana

It was a great read and a great guide. I chose to set up Prometheus in a docker container as all my other monitoring tools are there, but aside from that his guide was spot on, and I encourage everyone to read it.

However….

It didn’t quiet go far enough, in my mind. I could get a ton of stats that I couldn’t get before: licensed user count, logged in user count, attachments storage size. But without being able to tie it to what the underlying system was doing it just felt…incomplete. I mean it’s great you can tell you have 250 users logged in, but is that what’s really causing the lag in the system?

Luckily I had another tool in my toolbelt. I had previously setup Grafana and InfluxDB/Telegraf for use on other systems. It was merely a matter of setting up telegraf on the JIRA server, and start pumping in the required data that way.

To see how to install InfluxDB and Grafana via docker, I’d first suggest following this guide from a fellow homelabber:

My JIRA system runs on CentOS, so my first step in getting JIRA’s server info was to get the InfluxDB added to yum. To do so, I executed the following command:

cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name = InfluxDB Repository - RHEL 
baseurl = https://repos.influxdata.com/rhel/7/x86_64/stable/
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key
EOF

After this, it was a simple thing to do a yum install, and boom, I had telegraf on the system. As a matter of disclosure, I did look that up here, it’s not like I just knew it.

After installing it, I did need to configure it so that it would gather and send the stats I was interested in to InfluxDB. To do so, I took a backup of the default config file (found at /etc/telegraf/telegraf.conf), then replaced the original with my own generic config file for Linux systems:

[global_tags]
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = false
  logfile = "/var/log/telegraf/telegraf.log"
  hostname = "<<system host name, optional>>"
  omit_hostname = false
[[outputs.influxdb]]
  urls = ["<<url to your influxdb server>>"]
  username = "<<influxdb user>>"
  password = "<<influxdb password>>"
[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]

Then it was a matter of starting Telegraf, confirm in the logs that it was sending, and then start work in Grafana. After working the data a bit, I got the following dashboard:

::wipes up drool::

Now, this is still very much a work in progress. For one thing, the orange line is supposed to be the max java heap, and unless I stop JIRA and change it – I don’t think it’s supposed to change. And those gaps aren’t me restarting JIRA, so it should have been constant. (proxy problems, ammirite?)

The particularly tricky one to do is the CPU Graph and guage. Telegraf will breakdown CPU usage into io_wait, user, system, etc. To be fair, this is how it’s recorded to Linux internally, but to get a unified number, I had to take the idle time on the CPU, multiply it by -1, then subtract it from 100.

The second gotcha is under to options tab of the single stat block. It will always default to give you an average – which at times you will want. But in most cases will give you weird readings, like 0.4 users are currently logged in instead of one. To fix this, simply change it to read “Current”, as shown below.

So, you may be asking yourselves, why?

My first answer would be “Have you seen it! It’s just cool a.f.!”

But my second, more practical answer is incident detection and response. Just like in JIRA, Dashboards here give you and idea of whats going on now, as well as what’s been going on previously. You can even expand on this to tell if you are using swap, which could severely slow down a system. Or if you are out of memory for the JVM setting you have. Or you don’t have enough CPU power to process all the requests. And you can tell all this at a glance. You can even have this posted on a monitor on the wall so everyone can see it in real time.

My Brain at the moment….

So this post is a bit short…

I just got back home yesterday from a last minute trip I had to take. I knew I’d want to update the blog, but honestly couldn’t think of what to do it on. Luckily while I was gone I had worked on this as a distraction, so here we are. But this is a tool I wish I had one some of my previous engagements, and figured some of you might be able to use it as well. So, until next week, when I can hopefully complete part one of the Installing JIRA from scratch, this is Rodney, asking “Have you updated your JIRA issues today?”