Monitoring Health for On-Premises Deployments

Contents

Introduction

Monitoring On-Premises Components

Additional Monitoring

 

Introduction

Overview

This article provides examples and best practices for monitoring the health of an On-Premises OverOps installation. There are many monitoring tools available and this document aims to provide general guidance for monitoring the OverOps components regardless of the tools you are using.

Refer to OverOps Compatibility Guide for all supported software.

Note: Before beginning, use the below provided information to prepare for the procedures and configuration changes described in the main section of this document.

Use Case: Monitoring the State of the OverOps Implementation for On-Premises Deployment

When OverOps is deployed on-premises, the OverOps implementation cannot be monitored. As a diligent IT Operations or DevOps Manager you want to put processes in place which help you mitigate potential issues as soon as they occur.

This best practice provides details on how to monitor:

Processes

Ensure the process is running

Ports

Ensure the ports are listening

Log files

Monitor the log files for errors

HTTP Endpoints

Ensure endpoints respond to requests with acceptable response time.

 

Monitoring On-Premises Components

OverOps can be installed on-premises using Docker that requires an existing installation of Docker and Docker-Compose, or with a standalone non-Docker installation that requires only a JRE. Both installation types are currently supported on Linux only. Understanding the running processes in each installation type is critical for proper health monitoring. The components listed can be monitored using a typical process watcher.

Dashboard Server Components - in Docker

The Docker installation consists of the Docker service with three running processes.  Ensure the following processes are running when Docker is started either as a service or using the takipi-service.sh startup script:

Process

 

dockerd

Docker server

containerd

Docker management client

docker-proxy

Docker proxy server

 

The following Docker containers are required running critical processes:

Container

Entrypoint Command

takipi_dynalite_1

"/bin/sh -c $TAKIPI_DYNALITE_HOME/entrypoint.sh"

takipi_master_1

$TAKIPI_DYNALITE_HOME/entrypoint.sh"

takipi_queue_1

/takipi-entrypoint.sh rabbitmq-server"

takipi_storage_1

"/bin/sh -c /opt/takipi-onprem/takipi-storage/entrypoint.sh"

takipi_mysql_1

"docker-entrypoint.sh mysqld"

The takipi_master_1 maps to an external facing port listening on the Docker host. Typically, this is port 8080. Internal ports can only be monitored from the host.

Service

Port

Direction

Tomcat on takipi_master_1

8080

External

RabbitMQ on takipi_queue_1

4369, 5671-6672, 25672

Internal

MySQL

3306

Interal

 

Dashboard Server Components - in Non-Docker

The non-Docker implementation exhibits two running java processes.

Process Type

Java Jar File

Service and default ports

java

takipi-server/lib/dynalite-java.jar

OverOps Dynalite service
-port 4567

java

takipi-server/lib/takipi-backend.jar

OverOps embedded Tomcat server
-port 8080

If not connecting to an external database such as MySQL, There may be three additional running processes to support the H2 database that require monitoring.

Process Type

Java Jar File

Database Service and default ports

java

takipi-server/lib/h2.jar

stats -tcpPort 5000

java

takipi-server/lib/h2.jar

dynalite -tcpPort 5001

java

takipi-server/lib/h2.jar

qsql-h2 -tcpPort 5002

 

Agent/Collector Monitoring

The OverOps architecture consists of both an Agent and a Collector, but only the collector service requires monitoring. The process name for the collector service is takipi-service.

  • The Collector is a running daemon typically launched as a service at system startup.
  • Agents are started within the JVM and do not exhibit a running process. Agent counts are reflected as JVM counts and are displayed in StatsD data published by the Collector. See StatsD Metrics.

Additional Monitoring

Endpoint Monitoring

The following endpoints can be monitored for availability and response time:

URL

Assertion

Description

http://<Dashboard Server>:8080/login.html

<title>OverOps - Login</title>

Simple assertion to determine if the application service is running and accepting responses on the Tomcat port

http://<Dashboard Server>:8080/api/v1/services/<Service ID>/views

{"views":

REST API endpoint provides a list of views available from the server.  Requires basic authentication

 

Log files

All relevant log files for the various components can be found in the following locations:

Component

Location

Dashboard - Docker version

/<install directory>/takipi-server/storage/tomcat/logs

Dashboard - Non-Docker version

/<install directory>/takipi-server/log/<COMPONENT>. i.e.
/opt/takipi-server/log/tomcat/tomcat/Catalina.log

Collector

/<TAKIPI_HOME>/takipi/log/bugtale_service.log

Agent

/<TAKIPI_HOME>/takipi/log/agents

 

StatsD Metrics

OverOps supports sending metrics to third-party graphing and monitoring tools via StatsD, an open-source protocol to capture, aggregate and send metrics to modern DevOps tools. In addition to monitoring, these metrics can be used for Anomaly Detection, Visualization, Analytics, and Telemetry.

EndPoint

Description

overops_diagnostics_<HOSTNAME>_daemon-pulse

Time series representation of the status of the collector 1 means up 0 or no value means down.

overops_diagnostics_<HOSTNAME>_s3-calls

Time series representation on the amount of calls to the s3 service.

overops_diagnostics_<HOSTNAME>_backend-calls

Time series representation on the amount of calls to the backend-service.

In the example below OverOps is monitored using InfluxDB as the StatsD database. This query allows you to create a table of all running Collectors and their current status based on whether the pulse showed up in the last time series:

SELECT mean("value")
FROM /overops_diagnostics_.*_daemon-pulse/
WHERE $timeFilter GROUP BY time(5m)

Collectors

Time 

Metric

Value

2018-05-25 13:45:00

overops_diagnostics_ip-172-31-43-62_daemon-pulse.mean

-

2018-05-25 13:45:00

overops_diagnostics_ip-10-238-54-161_daemon-pulse.mean

1.00

2018-05-25 13:45:00

overops_diagnostics_ip-10-159-154-78_daemon-pulse.mean

1.00

2018-05-25 13:45:00

overops_diagnostics_ip-10-225-180-121_daemon-pulse.mean

1.00

2018-05-25 13:45:00

overops_diagnostics_ip-10-168-83-132_daemon-pulse.mean

1.00

2018-05-25 13:45:00

overops_diagnostics_ip-10-164-178-61_daemon-pulse.mean

1.00

2018-05-25 13:45:00

overops_diagnostics_ip-10-233-123-230_daemon-pulse.mean

1.00

2018-05-25 13:45:00

overops_diagnostics_ip-10-228-106-172_daemon-pulse.mean

1.00

 

Have more questions? Submit a request