Conversation
|
I have two issues with that change:
|
|
Let me give you some more context on my rationales behind the implementation. I understand the hypervisor CRD as an API that controls and surfaces a hypervisor' desired and actual state. This addition would make the desired overcommit and the actual state (effective capacity) transparant to consumers of the api. By default, the overcommit will look as follows: spec:
overcommit: {}which means, no overcommit is specified, i.e. we have a ratio of 1. If any consumer of the api, such as cortex, wants to set this overcommit ratio to something else, it would be defined like this: spec:
overcommit:
cpu: 3highlighting the desired state is that the hypervisor exposes 3 times as many cpus as it actually has. The actual state is then reflected in the I'm not sure about the float vs. uint aspect, we can reconsider. We can also use less limited |
|
I think, we should be clear about who sets a value and who reads a value. |
|
The spec is the desired state. In that terms, cortex desires the hypervisor to have y overcommit for x resource. Then, the controller reconciling the relevant parts of the resource, in this case the kvm node agent, would accept this desired state and reflect it in the status by calculating |
I agree with @fwiesel that this should be strongly typed, but we need to control it. EDIT: Basically copy&paste this definition |
Merging this branch will not change overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
I really do not think we want to build up here a duplicate infrastructure and secondary communication channels. From compute perspective, the nova agent reports the resources to placement. You own placement, and can do with that data what you want. Also store it in this CRD, if you insist. But at least I do not see a point involving the kvm-node-agent here. |
|
The kvm node agent is our source for hypervisor allocation and capacity. It extracts this information directly from libvirt. So to me it makes sense that the node agent confirms the desired capacity has changed after the spec was adjusted. In this way, only one controller is tasked with owning this part of the status, and cortex is not involved apart from watching the state. |
|
The initial purpose and intent of the kvm-node-agent according to the discussion I remember having with @auhlig, @dhoeller, @bashar-alkhateeb and @notandy was, that "duplication" is okay, if it presents us with another view from libvirt perspective. The equivalent of the Vcenter. I do not object exposing more information from libvirt, but here we are talking about a different concern.
Yes, and that starts to worry me. libvirt (and therefor the kvm-node-agent) is only guaranteed to know about running VMs. We need that view, to be able to see like in the vcenter, what is actually running on the host. If you want to also use it for scheduling purposes, feel free to do so. You are still responsible for the correctness of the scheduling decisions. I push back here on the implicit request on having higher guarantees from the Another problem is, only the resources you have right now are modeled in libvirt, it doesn't cover any custom resources, and network interface, (GPUs?) IOPS, etc.. won't be represented in libvirt, and therfore not synced by the kvm-node-agent, and in my opinion never should. I see any kind of capacity management or workload management functionality aside from the basic reporting of readily available information in libvirt (or nova) outside of the scope of either the kvm-node-agent and the openstack-hypervisor-operator.
How can the node-agent make there any informed decision besides the most basic maths? The question here is who is maintaining and is responsible for what. We need clear responsibilities and separation of concerns. This feels to me like you are moving here responsibilities out of Cortex, which clearly belong there. @notandy Do you see it differently? |
|
Thanks for the elaborate feedback. Regarding the point that allocation will be reflected only after the vm spins up. This will be considered by cortex' Reservation CRD which takes space before the VM is ordered to be scheduled. Apart from the delayed nature, the autodiscovered capacity and allocation + overcommit are assumed to be accurate. We autodiscover cpu and memory for now. Additions may come later, if needed, but the CRD proposal is already open enough to support them. To your second point, I see a clear separation of concerns. Cortex, or any consumer, is able to express its need for more space or can give back some, via the desired state (the spec). Yes, the logic backing |
I must say I don't see this a blocking argument. I am fine that the node-agent is updating the status based on overcommit value and existing VMs reported by libvirt. If it is, and stays, "basic math", something we would've done in modern SQL if etcd would be SQL, I am fine let kvm-node-agent doing it since its the easiest place to do it instead of introducing another reconciler that goes over all CROs just to calculate 1+1. I respect that you like to see separate responsibilities as best as we can, but the architecture (as well as openstack is doing it with placement and nova, etc.) shows it's not always easy/possible. Also, due to missing alternative approaches I am inclined to approve this PR. |
We want to reflect the overcommit ratio in the hypervisor crd. This is useful for cortex, so it can calculate how much space is left on a hypervisor. By default, it is assumed that the overcommit for each resource is set to 1. If anything else is specified, it will be reflected in the new
effectiveCapacitystatus field by the kvm node agent. This will allow cortex to take control of the hypervisor overcommit ratio, for now statically, and in the future dynamically.