Virtualization

 View Only
Expand all | Collapse all

Context switches in Guest OS

  • 1.  Context switches in Guest OS

    Posted Oct 29, 2008 01:36 AM

    Hi all,

    I had questions related to context switches on a guest OS in a virtualized environment. Here is my understanding:

    When the guest OS wishes to do a context switch, the interrupt (probably timer interrupt is intercepted by the VMM and the VMM creates the exception frame before forwarding the interrupt to the guest OS. Is my understanding correct?

    Now, when the guest OS needs to schedule a new process (thread), is this also intercepted by the VMM which in turn restores the context of the next thread to be scheduled on the processor ?

    Finally, I would be grateful if someone could point me to a detailed description of context switches on guest OSes highlighting the details of involvement of VMM in this regard

    Thank you all for your time

    -SC



  • 2.  RE: Context switches in Guest OS

    Posted Oct 29, 2008 02:43 AM

    Unless you are talking about a paravirtualized system, you need to consider actions at a finer granularity.

    Yes, if a guest process/thread exhausts its quantum, there will probably be a virtual timer interrupt to initiate the context switch. However, there isn't necessarily a correlation between physical timer interrupts and virtual timer interrupts. The VMM must arrange to somehow regain control from the guest when the virtual timer interrupt is delivered, but there are a few options for accomplishing that, and only one of those options is to program a physical timer. Ultimately, the virtual timer device is responsible for notifying the VMM that it has an event to deliver.

    When the virtual timer interrupt is delivered, the VMM must arrange for the guest to vector through its appropriate IDT entry. In a binary translation system, the VMM does construct the exception frame. However, with VT or AMD-V, the VMM simply injects the virtual interrupt into the guest and lets the hardware construct the exception frame. From there, the guest OS continues execution in the timer interrupt handler, doing whatever guest OSes do. A context switch is a heavyweight operation, and the VMM may become involved at multiple points. In a binary translation system, the VMM is involved in making and dispatching translations of the ring 0 code that constitutes the guest OS context switch routine. With VT or AMD-V, the guest OS may be allowed to perform most, if not all, of the context switch code natively. VT requires that the VMM construct shadow page tables, so the VMM must intercept any guest OS modifications of CR3, and load the guest's CR3 with an appropriate shadow value instead. However, with AMD-V and RVI, the VMM maintains a nested page table to map guest physical addresses to host physical addresses in much the same way that the regular page table maps guest linear addresses to guest physical addresses. The composition of these page tables is used directly by hardware to map guest linear addresses to host physical addresses. There is no need for shadow page tables, so guest modifications of CR3 don't have to be intercepted.

    In short, the guest OS actually restores the context of the next process/thread to be scheduled on the virtual processor. The VMM may have to become involved at various points to properly virtualize this new context, but the VMM has no need to understand what constitutes a guest process/thread or how to perform a context switch.



  • 3.  RE: Context switches in Guest OS

    Posted Oct 29, 2008 02:51 AM

    Thanks for the detailed description. I understand that the guest OS runs the scheduling algorithm and decides on which context to be restored next, in short, is responsible for performing the actual context switch. However, I was wanting to modufy the exception frame or the context generated on a context switch and store some additional information along with the machine state of the thread.

    Secondly, when the OS schedules a new context to run on the machine, I was wanting to extract some information out of the context before the thread is allowed to run.

    In short, I was wanting to get some information in the VMM out of the machine context of the thread that is to be scheduled next to run on the procesor by the guest OS. Is it possible?

    Thanks once again

    -SC



  • 4.  RE: Context switches in Guest OS

    Posted Oct 29, 2008 03:06 AM

    If you're writing your own VMM, you can certainly do all of this. With VMware Workstation 6.5, you can use vprobes to query guest state, but I don't think you can make any modifications to guest state. You also have to have an intimate knowledge of your guest OS.



  • 5.  RE: Context switches in Guest OS

    Posted Oct 29, 2008 03:24 AM

    Thanks for your quick replies. Actually I was reading a paper that uses hypervisors for security and there they have modified the exception frame to store additional information. Here is the text from the paper with SID and ecrypted registers being the additional information stored with the exception frame. It did not make complete sense to me, so I thought of verifying on the forum. Thanks once again and any additional comments will be great.

    "We can realize the transition of current domain by intercepting

    every interrupt and exception generated by hardware. Hypervisors

    are, by definition, capable of intercepting all interrupts and exceptions.

    When the hypervisor forwards an interrupt to a guest

    operating system, the hypervisor can change the current domain by

    setting current sid variable to 0.

    The secure domain context, which is to contain register contexts

    and SID of the outgoing domain, is realized by extending the

    exception frame structure. As briefed in Section 2.2, the processor

    generates an exception frame into the kernel mode stack upon

    an interrupt, and the hypervisor already simulates this behavior to

    virtualize interrupts. We extend this exception frame to contain a

    secure domain context. Thus, this extended exception frame has a

    new field for general-purpose registers (GPRs) and SID value of

    the outgoing domain. These fields are encrypted and hashed. When

    the hypervisor forwards an interrupt to a guest operating system, it

    generates this extended exception frame instead of the original one.

    The GPRs are cleared when the hypervisor raises a virtual

    interrupt by generating a secure exception frame. Upon receipt of

    this interrupt, the guest operating system will find the GPRs to be

    zeroed out. This is to prevent information leakage upon domain

    switch, because the operating system is untrusted.

    After handling the virtual interrupt, the guest operating system

    requests the hypervisor to perform a ‘return-from-interrupt’ operation

    using the extended exception frame that have been saved from

    a previous interrupt. Upon receipt of this request, the hypervisor

    processes the extended exception frame to restore GPRs and SID

    value.



  • 6.  RE: Context switches in Guest OS

    Posted Oct 29, 2008 12:17 PM

    Let me first point out that this excerpt appears to be concerned with transitions between ring 3 application code and ring 0 kernel code rather than context switches between guest OS processes/threads.

    The second paragraph appears to assume that there is no hardware virtualization support for delivering interrupts or exceptions to the guest. Modern hypervisors on modern hardware do not build an exception frame in software. Having said that, even with hardware virtualization support, the hypervisor is free to push whatever it likes on the guest stack prior to IDT vectoring, particularly since the third paragraph indicates that returns from the exception/interrupt handler are performed with a hypercall rather than a simple iret instruction.

    Note that the use of a hypercall for 'return-from-interrupt' as described in the third paragraph implies a paravirtualized guest.

    This may be nitpicking, but it would be unwise to clear all GPRs. In particular, you would want the stack pointer to contain the address of the bottom of the exception frame rather than zero. Furthermore, it seems that a truly secure implementation would stash encrypted FPRs and clear them as well.



  • 7.  RE: Context switches in Guest OS

    Posted Nov 02, 2008 05:32 AM

    Thanks once again for your detailed answers. I believe one final

    question that I have is, assuming no hardware virtualization support,

    will a context switch result in a trap to the VMM, or more explicitly, will the VMM get a trap when a process is descheduled on a guest OS AND more importantly when a new process is being scheduled to run on the processor next.

    Thanks for your time

    -SC



  • 8.  RE: Context switches in Guest OS

    Posted Nov 02, 2008 12:17 PM

    If you mean to ask "does the hypervisor have on opportunity to intervene at such events," the answer is yes. However, I have to quibble with the word "trap." In an x86 context, a trap is a synchronous exception delivered at instruction retirement. In more general terms, a trap often refers to a non-programmatic transfer of control resulting from an interrupt or an exception. Neither use of this term strictly applies to the scheduling of a new process.

    Guest user processes execute in ring 3, and the guest supervisor or kernel executes in ring 0. A context-switch always begins with a transfer of control from the guest user process running in ring 3 to the guest supervisor or kernel at ring 0. Ultimately, this operation completes with a transfer of control from the guest OS supervisor or kernel at ring 0 back to the new user process at ring 3.

    Because x86 processors are not classically virtualizable by a trap-and-emulate approach, a hypervisor that does not use hardware virtualization must maintain absolute control of all guest ring 0 code to properly virtualize any privileged instructions that the guest may execute. Whether this is done through emulation or through binary translation, the guest supervisor or kernel cannot do anything at all without the hypervisor having an opportunity to intervene.

    Though hardware virtualization changes the story, it is still possible to construct a hypervisor that has this behavior. However, one ends up losing some of the benefits of hardware virtualization as a result.

    The only stumbling block here is recognizing when a new process has been scheduled by the guest OS. X86 hardware does not define what constitutes a process, and each guest OS may have a different idea of what constitutes a process. Without a priori knowledge of the guest OS that is running or some elements of paravirtualization, it could be difficult for the hypervisor to recognize a context-switch. One could argue that different processes must reside in different virtual address spaces, and the hypervisor can easily detect a change of virtual address space. However, there are operating systems that run in real-address mode, without any virtual-address spaces at all. Moreover, even in a more typical OS, multiple threads within a process are likely to share the same virtual address space, so it could still be difficult to recognize a thread-switch. Certainly, one could use heuristics, such as a change in segment registers, but one could then devise an adversarial guest OS that would defeat such heuristics, by introducing false positives, at the very least.



  • 9.  RE: Context switches in Guest OS

    Posted Nov 03, 2008 03:45 PM

    Thanks once again for the detailed reply. I believe in a full virtualized environment (VMWare ESX), writes to CR3 will result in a trap to the VMM as the VMM needs to make CR3 point to the appropriate shadow page table and not the page table used by the guest OS. So I was wondering, on a context switch, the guest OS will try to change CR3 to point to the page table of the new process, and this will trap to the VMM, giving the VMM enough information to know that the guest OS is doing a context switch. Does it make sense ?

    Thanks for your time

    -SC



  • 10.  RE: Context switches in Guest OS
    Best Answer

    Posted Nov 03, 2008 03:59 PM

    That's what I meant when I said "One could argue that different processes must reside in different virtual address spaces, and the hypervisor can easily detect a change of virtual address space."

    However, this doesn't work for all operating systems. Take MS-DOS, for example.



  • 11.  RE: Context switches in Guest OS

    Posted Nov 03, 2008 08:29 PM

    Thank you so much !!



  • 12.  RE: Context switches in Guest OS

    Posted Nov 03, 2008 09:05 PM

    I am sorry for getting this topic up again and again, but I was reading through our discussion again, and I guess there can be another way for the VMM to know of a context switch. Lets say, the guest OS wanted to switch to another process, the hypervisor will intercept the interrupt, create the exception frame and push it on the kernel's (guest OS) stack. Now the guest OS, runs its scheduler to decide on the next process to schedule and executes an IRET instruction with the context of the process to be executed next passed as an argument. The IRET will again be intercepted by the VMM, which can then examine the contents of the exception frame of the process that will be scheduled next before passing the control back to the VMM.

    I might be going around in circles, but I am trying to get this concept crystal clear in my head.

    Thanks for your help

    -SC



  • 13.  RE: Context switches in Guest OS

    Posted Nov 03, 2008 10:11 PM

    You've come full circle back to intercepting the transition from ring 0 to ring 3. IRET is one possible way of making such a transition. LRET is another, since this is a return to a higher privilege level. SYSRET and SYSEXIT offer other avenues, depending on which CPU vendor we're talking about and whether or not the guest supervisor or kernel runs in 64-bit mode.

    It is relatively easy for the hypervisor to intercept a transition from ring 0 to ring 3, particularly if we ignore hardware assisted virtualization. The tricky part is determining when such a transition is to a different guest process than the process that was active at the most recent ring 3 to ring 0 transition.

    You also should not be so focused on the expiration of the quantum. Quite often, scheduling decisions are made because of blocking on synchronous I/O, rather than on the expiration of the quantum.



  • 14.  RE: Context switches in Guest OS

    Posted Oct 29, 2008 02:51 AM

    well thanks that cleared it up :smileyshocked:

    Tom Howarth

    VMware Communities User Moderator