This discovery details of the requirements and scope to implement v3 of the OpenCraft Instance Manager (Ocim) as a new project in lieu of upgrading the current version to work with Grove.
Why deprecate Ocim v2?¶
The decision to deprecate Ocim isn't taken lightly. More often than not, a rewrite is not recommended due to the fact that the software being replaced usually handles a lot of edge cases that the new one won't consider.
With the advent of Grove and Kubernetes-based deployments, there are parts of Ocim that are now superfluous, like provisioning VMs and deploying with Ansible.
Besides this, Ocim has languished and there a sizable amount of work required to update it's stack. In particular:
- The Django version used is 2.2.24, which is no longer supported as of April 11 2022.
- The Ocim frontend is written Angular v1, which is no longer supported as of April 2022.
- The assumptions baked into it's architecture is for a much more different deploy than Grove's.
Considering these, it makes sense to validate whether a rewrite is worth doing.
How will Ocim v3 improve on this situation?¶
Ocim v3 will depend heavily on Grove as a deployment backend. Because of the complexity of doing deployments and managing is encapsulated in Grove, Ocim v3 can be a lot simpler.
- Using Gitlab triggers, managing instances via an API is possible. Ocim won't need to do the heavy lifting but serve as the user-friendly representation layer of Grove.
- Infrastructure is set up once per cluster. With Grove's dependency on
terraform, it's easy to keep the underlying machines up to date.
- Deployments are done using Kubernetes rather than Ansible, which will make triaging and fixing deployment issues much easier.
- Ocim v3 will be much smaller both in scope and features than v2.
- Most of the work will happen on Gitlab/Grove rather than Ocim v3. As a result, Ocim v3 will not need to store a lot of state and recover it's state by looking at Gitlab repo and Kubernetes cluster.
- Ocim v3's frontend will be built on React and Typescript.
- We will not consider the client Console at all since there are no plans to keep the P&T plans going.
- There's no requirement to implement websockets for the MVP.
- We'll use Listaflow as a template, reusing as much of it as possible including the design to save time and effort.
- Python will be the backend lanugage, using Django with DRF and Celery as the main workhorses.
- Typscript, React + Redux (and Providence) will be the main technologies on the frontend.
- Deployments will be to a Kubernetes cluster using Helm charts.
For the MVP, we will build these screens:
- Reset Password
- Logged out
- Registration via an invite link
- Main Page
- Sidebar that lists clusters, can be expanded to show instances
- A cluster information page
- An instance information page
- Admin pages (but we'll just use the Django admin)
The authentication-related pages should be similar to the current ones on Listaflow. Providing login via email or using Google sign in.
Password Reset and logged out pages should be implemented as well.
Once a user is logged in they will see the...
The layout of which will be similar to Ocim v2's.
- On the left add a sidebar stretching the full page height. We will list the clusters here. Users will be able to select a cluster/instance to work with.
- To the right of the sidebar, taking up the rest of the page width, will be the main content. In here we will load further detail on the objects the user wants to see.
- And the top of the page should container the navigation header. We ill show the user information and a button that takes the user to a page to add a new Gitlab repo.
On the Main page we will need filters for at least these fields:
- Cluster name
- Cluster provider
- Instance name
- Instance/cluster configuration
- Instance status
A sidebar similar to Ocim's which will list the clusters for each repository that has been configured. The items will be clickable and load their information into the main content page.
Once a cluster is selected, we will show a cluster information page containing relevant details. Including, but not limited to,
We envision three action buttons:
- Deploy a new instance. This button will take the user to a new page, where they can add any config overrides before deploying a new instance.
- Remove/Delete. This button will remove the cluster from Ocim only.
- Update. A page to update the cluster's information.
As well as being able to do the above, we will need to show some information on the cluster like:
- The cluster name
- Gitlab project id.
- A link to the Gitlab project.
- The cluster status (whether it's online or not).
- The result of the latest pipeline.
- The list of instances on the cluster and their current status.
The instances are should be links that take the user to the Instance Detail page for that instance.
Instance Detail Page¶
When the user clicks on an instance, we need to show them information on the particular instance.
- The instance name.
- The instance's status.
- The status of the instance's pods.
- Any replication stats
- The current
- The current
- Results of the latest pipeline for this instance.
There will be links/buttons that allow the user to manage an instance. For the MVP add buttons for:
- Update an instance
- Archive an instance
- Delete an instance
- Refresh the display data (if it's outdated somehow)
- Reprovision (gated feature)
Refreshing the data should not be required, but for the first few releases it's advised to allow users to force refresh anything that's been cached.
Modeling the data¶
- Deploy keys
- Gitlab API keys to fetch project info
- OpenFAAS function credentials per cluster
We can using OpenFAAS functions as a "backdoor" to the K8s environments (to fetch info) rather than communicating with the cluster itself. We can set up functions for example:
- Fetching the currently running instances
- Fetching instances with any kind of error state (ie. CrashLoopbackOff containers for Forum or ES?)
- Replication statistics per instance
- Retrieving usage report per instance (already implemented).
Since Grove works with Gitlab we'll need a database table to store these. It'll be up to the operator to provision the Gitlab repository first and set the necessary permissions so that Ocim v3 can access it.
We will mirror these details in the database:
It will be required to keep this data in sync with the Gitlab Repository. We propose that two methods be implemented:
- A periodic task that syncs the repository data.
- A webhook that Gitlab hits whenever something on the repository change.
Pipeline state will be synced to Ocim v3 via webhooks (or tasks that check periodically).
Fields required will be:
- last message from the pipeline if retrievable
- link to the pipeline
- foreign keys to cluster/instance
Real time values are required to determine a pipeline's status. It won't be necessary to store the individual stages in the pipeline. Just the start and end results should be enough.
For an Kubernetes cluster, we'll store or make available the following information and update from the repository as required.
- cluster name
- provider (AWS or Digital Ocean for now)
- cluster config (read from cluster.yml)
- gitlab project
- max node count
- node size/instance class
- mysql database size
- mongodb database size
- status (pending, active, archived, deleted)
- links that will be needed
- monitoring page for Prometheus
- monitoring page for Grafana
- monitoring page Alert Manager
- link to Gitlab project
- link to AWS or Digital Ocean k8s admin page
Instances will need at least the below fields:
- cluster foreign key
- config.yml as retrieved from the repo
- grove.yml as retrieved from the repo
- Create instance
- Update instance (with new config)
- Archive instance
- Unarchive an instance
- Delete Instance
- Build images for instances
All of the above will take be initiated using the Gitlab pipeline API and can then be tracked via the pipelines.
DNS will need to be setup outside Grove once per cluster.