Cluster Monitoring¶
By default Grove ships with Prometheus, Grafana and Alert Manager and OpenSearch for monitoring.
Currently, none of these services are exposed via the ingress. They can all be accessed
by forwarding the relevant port. Listed below are the commands to view the UI for each of the
services. These need to be invoked from within the control
directory.
- Prometheus:
./kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/prometheus-operated 8001:9090
- Alert Manager:
./kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/alertmanager-operated 8001:9093
- Grafana:
./kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/prometheus-operator-grafana 8001:3000
- OpenSearch Dashboard:
./kubectl --namespace monitoring port-forward --address 0.0.0.0 deployments/opensearch-dashboard-opensearch-dashboards 8001:5601
After running any of the commands above, you will then be able to view the relevant UI in your browser at http://localhost:8001.
Components¶
OpenSearch Dashboard¶
Cluster logs are forwaded using Fluent-bit to OpenSearch. They can be accessed via the OpenSearch dashboard.
Accessing the dashboard is possible by running the following command within the control
directory.
./kubectl --namespace monitoring port-forward --address 0.0.0.0 deployments/opensearch-dashboard-opensearch-dashboards 8001:5601
The username is admin
and the password can be retrieved with:
./tf output -raw opensearch_dashboard_admin_password
On the first run, you will need to create an Index Pattern for fluent-bit
. Once done, you will be able to view
the logs in your discover page.
Grafana¶
Both the username and password for the default user is admin
.
You will be requested to change it after logging in the first time.
The Kubernetes Resource Workload dashboard is loaded by default with more dashboards available via the sidebar's Browse item.
Alert Manager¶
Alert Manager is not configured by default to send any notifications. The configuration can be changed
by setting the TF_VAR_alert_manager_config
variable in Gitlab or in your private.yml
if working locally.
The provided value needs to be valid yaml
as expected by Alert Manager.
Shown below is an example of configuring email alerts:
TF_VAR_alert_manager_config: |
receivers:
- name: "null"
- name: email
email_configs:
- to: 'receiver_mail_id@example.com'
from: 'mail_id@example.com'
smarthost: smtp.example.com:587
auth_username: 'mail_id@example.com'
auth_identity: 'mail_id@example.com'
auth_password: 'password'
Default null route
Note that "null"
receiver is required. Due to the way values are merged in helm, this receiver needs to exist otherwise you'll receive undefined receiver
error. Example:
level=error ts=2020-10-23T12:08:02.428Z caller=coordinator.go:124 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="undefined receiver \"null\" used in route"
Visit this Github issue for more details.
Ingress¶
Ingress for the monitoring services are disabled by default, but can be enabled by setting the Terraform variable TF_VAR_enable_monitoring_ingress
to true
in your CI/CD vars
/cluster.yml
and updating your DNS to point to the cluster.
Lets Encrypt Email¶
Set TF_VAR_lets_encrypt_notification_inbox
variable to a valid email address
to received Lets Encrypt Renewal notifications. Note that certificate generation
will not work if this address isn't valid.
DNS¶
You will need a valid base domain to set up the monitoring services.
Assuming your base domain is *.monitoring.grove.dev
, ingresses will be created for:
prometheus.monitoring.grove.dev
grafana.monitoring.grove.dev
alert-manager.monitoring.grove.dev
opensearch-dashboards.monitoring.grove.dev
Access to the above is handled via the Nginx Controller. To set this up:
- Set the variable
TF_VAR_cluster_domain
to your desired domain. - Obtain your controller's External IP with the command
./kubectl get services -nkube-system ingress-nginx-controller
. - Create an
A Record
for*.your-monitoring-domain.com
to the controller'sExternal IP
.
After applying the changes your services will be available as described above.
If certificates aren't generated, please check the Cert Manager documentation for troubleshooting steps.
Authentication¶
All services are protected with Basic Authentication to stop unfettered access to your
data. The credentials are the same for all services, with the username admin
and
the password that can be retrieved with ./tf output -raw monitoring_ingress_password
.