Skip to content

Cluster Monitoring

By default Grove ships with Prometheus, Grafana and Alert Manager and OpenSearch for monitoring.

Currently, none of these services are exposed via the ingress. They can all be accessed by forwarding the relevant port. Listed below are the commands to view the UI for each of the services. These need to be invoked from within the control directory.

  • Prometheus: ./kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/prometheus-operated 8001:9090
  • Alert Manager: ./kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/alertmanager-operated 8001:9093
  • Grafana: ./kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/prometheus-operator-grafana 8001:3000
  • OpenSearch Dashboard: ./kubectl --namespace monitoring port-forward --address 0.0.0.0 deployments/opensearch-dashboard-opensearch-dashboards 8001:5601

After running any of the commands above, you will then be able to view the relevant UI in your browser at http://localhost:8001.

Components

OpenSearch Dashboard

Cluster logs are forwaded using Fluent-bit to OpenSearch. They can be accessed via the OpenSearch dashboard.

Accessing the dashboard is possible by running the following command within the control directory.

./kubectl --namespace monitoring port-forward --address 0.0.0.0 deployments/opensearch-dashboard-opensearch-dashboards 8001:5601

The username is admin and the password can be retrieved with:

./tf output -raw opensearch_dashboard_admin_password

On the first run, you will need to create an Index Pattern for fluent-bit. Once done, you will be able to view the logs in your discover page.

Grafana

Both the username and password for the default user is admin. You will be requested to change it after logging in the first time.

The Kubernetes Resource Workload dashboard is loaded by default with more dashboards available via the sidebar's Browse item.

Alert Manager

Alert Manager is not configured by default to send any notifications. The configuration can be changed by setting the TF_VAR_alert_manager_config variable in Gitlab or in your private.yml if working locally.

The provided value needs to be valid yaml as expected by Alert Manager.

Shown below is an example of configuring email alerts:

TF_VAR_alert_manager_config: |
  receivers:
  - name: "null"
  - name: email
    email_configs:
    - to: 'receiver_mail_id@example.com'
      from: 'mail_id@example.com'
      smarthost: smtp.example.com:587
      auth_username: 'mail_id@example.com'
      auth_identity: 'mail_id@example.com'
      auth_password: 'password'

Default null route

Note that "null" receiver is required. Due to the way values are merged in helm, this receiver needs to exist otherwise you'll receive undefined receiver error. Example:

level=error ts=2020-10-23T12:08:02.428Z caller=coordinator.go:124 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="undefined receiver \"null\" used in route"

Visit this Github issue for more details.

Ingress

Ingress for the monitoring services are disabled by default, but can be enabled by setting the Terraform variable TF_VAR_enable_monitoring_ingress to true in your CI/CD vars/cluster.yml and updating your DNS to point to the cluster.

Lets Encrypt Email

Set TF_VAR_lets_encrypt_notification_inbox variable to a valid email address to received Lets Encrypt Renewal notifications. Note that certificate generation will not work if this address isn't valid.

DNS

You will need a valid base domain to set up the monitoring services.

Assuming your base domain is *.monitoring.grove.dev, ingresses will be created for:

  • prometheus.monitoring.grove.dev
  • grafana.monitoring.grove.dev
  • alert-manager.monitoring.grove.dev
  • opensearch-dashboards.monitoring.grove.dev

Access to the above is handled via the Nginx Controller. To set this up:

  • Set the variable TF_VAR_cluster_domain to your desired domain.
  • Obtain your controller's External IP with the command ./kubectl get services -nkube-system ingress-nginx-controller.
  • Create an A Record for *.your-monitoring-domain.com to the controller's External IP.

After applying the changes your services will be available as described above.

If certificates aren't generated, please check the Cert Manager documentation for troubleshooting steps.

Authentication

All services are protected with Basic Authentication to stop unfettered access to your data. The credentials are the same for all services, with the username admin and the password that can be retrieved with ./tf output -raw monitoring_ingress_password.