On Thursday, September 15, 2022 11:43 PM, Liang, Kan wrote: > On 2022-09-15 10:39 a.m., Wang, Wei W wrote: > > On Thursday, September 15, 2022 9:55 PM Liang, Kan wrote: > >> On 2022-09-14 10:46 p.m., Wang, Wei W wrote: > >>> On Thursday, September 15, 2022 4:26 AM, Liang, Kan wrote: > >>>> The perf_event_disable() eventually invokes the intel_pt_stop(). > >>>> We already expose the intel_pt_stop()/cpu_emergency_stop_pt() to > >>>> other modules. I don't think we have to use the perf_event_disable(). > >>>> Also, the > >>>> perf_event_disable() requires extra codes. > >>>> > >>>> I went through the discussions. I agree with Sean's suggestion. > >>>> We should only put the logic in the KVM but all the MSR access > >>>> details into the PT driver. > >>> > >>> Even the driver itself doesn’t drive the save/restore of the MSRs, > >>> it is drived > >> by perf. > >> > >> It through perf_event, not driven by perf_event. The perf_event > >> generic code never knows when should invokes each driver to > >> save/restore information. It should be driven by the other subsystem e.g., > scheduler. > > > > Yes. The cpu scheduler does this via the perf subsystem, though. > > > >> > >> For this case, KVM should drive the save/restore, and the PT driver > >> eventually does all the MSR access details. > >> > >>> 1. If we make KVM a user of perf, we should do this via > >> perf_event_disable/enable_*. > >>> 2. If we make KVM an alternative to perf (i.e. have direct control > >>> over PMU HW), we can do this via driver interfaces like perf. > >>> Per my experience, we should go for 1. Probably need Peter's > >>> opinions on > >> this. > >>> > >> > >> For 1, the perf_event_disable/enable_* are not enough. They don't > >> save/restore MSRs. > > > > perf_event_disable will go through perf to call pt_event_stop which saves > the related MSRs, right? > > I don't think so. The pt_event_stop() doesn't save all the > MSR_IA32_RTIT_* MSRs. Not all the MSRs are required to be saved. In general, pt_event_stop should have saved all the MSRs required for an event switching. Otherwise the general usages of PT have been broken. To be more precise, the following MSRs are not saved by pt_event_stop, but I don’t see they are required to be saved: - MSR_IA32_RTIT_CR3_MATCH: I don’t see it is used by perf. Seems like KVM saved an MSR that's not even used by the host. - Address range MSRs (MSR_IA32_RTIT_ADDR0_A etc.): Those are provided by s/w and not updated by h/w. So they're just set to MSRs when event gets scheduled in. There is no need to save. > > > (if so, what large changes did you mean?) > > > >> If we go to this way, we have to introduce a new generic interface to > >> ask each driver to save/restore their MSRs when the guest is > >> entering/exiting. We'd better combine the new interface with the > >> existing > >> perf_guest_get_msrs() of the core driver. > >> I think that's an ideal solution, but requires big changes in the code. > >> > >> 2 is the current KVM implementation. See pt_save_msr()/pt_load_msr(). > >> I don't think it's a right way. We'd better fix it. > >> > >> The suggestion should be 3. The KVM notify the PT driver via the > >> interface provided by PT. The PT driver save/restore all the registers. > >> I think it's an acceptable solution with small code changes. > > > > This looks like we just relocate the save/restore functions to the PT driver > and KVM still directly call them - still not going through perf's management. > Imagine every user operates on the pmu h/w directly like this, things would be > a mess. > > > > > The pt_event_stop() and the proposed interface still manipulate the PT event > pt->handle.event. The event status is updated as well. It's still under control of > the perf_event. Did you mean to handle the PT event in the proposed driver API? Event status is just one of the things. There are other things if we want to make it complete for this, e.g. event->oncpu = -1, and eventually seems we will re-implement perf_event_disable_*. Btw, Xiaoyao has made it work with perf_event_disable_local, and don’t have that many changes. If necessary, we can post the 2nd version out to double check. Thanks, Wei
On 2022-09-15 10:30 p.m., Wang, Wei W wrote: > On Thursday, September 15, 2022 11:43 PM, Liang, Kan wrote: >> On 2022-09-15 10:39 a.m., Wang, Wei W wrote: >>> On Thursday, September 15, 2022 9:55 PM Liang, Kan wrote: >>>> On 2022-09-14 10:46 p.m., Wang, Wei W wrote: >>>>> On Thursday, September 15, 2022 4:26 AM, Liang, Kan wrote: >>>>>> The perf_event_disable() eventually invokes the intel_pt_stop(). >>>>>> We already expose the intel_pt_stop()/cpu_emergency_stop_pt() to >>>>>> other modules. I don't think we have to use the perf_event_disable(). >>>>>> Also, the >>>>>> perf_event_disable() requires extra codes. >>>>>> >>>>>> I went through the discussions. I agree with Sean's suggestion. >>>>>> We should only put the logic in the KVM but all the MSR access >>>>>> details into the PT driver. >>>>> >>>>> Even the driver itself doesn’t drive the save/restore of the MSRs, >>>>> it is drived >>>> by perf. >>>> >>>> It through perf_event, not driven by perf_event. The perf_event >>>> generic code never knows when should invokes each driver to >>>> save/restore information. It should be driven by the other subsystem e.g., >> scheduler. >>> >>> Yes. The cpu scheduler does this via the perf subsystem, though. >>> >>>> >>>> For this case, KVM should drive the save/restore, and the PT driver >>>> eventually does all the MSR access details. >>>> >>>>> 1. If we make KVM a user of perf, we should do this via >>>> perf_event_disable/enable_*. >>>>> 2. If we make KVM an alternative to perf (i.e. have direct control >>>>> over PMU HW), we can do this via driver interfaces like perf. >>>>> Per my experience, we should go for 1. Probably need Peter's >>>>> opinions on >>>> this. >>>>> >>>> >>>> For 1, the perf_event_disable/enable_* are not enough. They don't >>>> save/restore MSRs. >>> >>> perf_event_disable will go through perf to call pt_event_stop which saves >> the related MSRs, right? >> >> I don't think so. The pt_event_stop() doesn't save all the >> MSR_IA32_RTIT_* MSRs. > > Not all the MSRs are required to be saved. In general, pt_event_stop should have > saved all the MSRs required for an event switching. Otherwise the general usages > of PT have been broken. To be more precise, the following MSRs are not saved by > pt_event_stop, but I don’t see they are required to be saved: > > - MSR_IA32_RTIT_CR3_MATCH: I don’t see it is used by perf. > Seems like KVM saved an MSR that's not even used by the host. > > - Address range MSRs (MSR_IA32_RTIT_ADDR0_A etc.): Those are provided by s/w and not updated by h/w. > So they're just set to MSRs when event gets scheduled in. There is no need to save. > OK. I think you need a clean-up patch to fix them first. >> >>> (if so, what large changes did you mean?) >>> >>>> If we go to this way, we have to introduce a new generic interface to >>>> ask each driver to save/restore their MSRs when the guest is >>>> entering/exiting. We'd better combine the new interface with the >>>> existing >>>> perf_guest_get_msrs() of the core driver. >>>> I think that's an ideal solution, but requires big changes in the code. >>>> >>>> 2 is the current KVM implementation. See pt_save_msr()/pt_load_msr(). >>>> I don't think it's a right way. We'd better fix it. >>>> >>>> The suggestion should be 3. The KVM notify the PT driver via the >>>> interface provided by PT. The PT driver save/restore all the registers. >>>> I think it's an acceptable solution with small code changes. >>> >>> This looks like we just relocate the save/restore functions to the PT driver >> and KVM still directly call them - still not going through perf's management. >> Imagine every user operates on the pmu h/w directly like this, things would be >> a mess. >>> >> >> >> The pt_event_stop() and the proposed interface still manipulate the PT event >> pt->handle.event. The event status is updated as well. It's still under control of >> the perf_event. > > Did you mean to handle the PT event in the proposed driver API? Event status is just > one of the things. There are other things if we want to make it complete for this, > e.g. event->oncpu = -1, and eventually seems we will re-implement perf_event_disable_*. > As my understand, perf always check the status first. If it's a stopped or inactivated event, I don't think event->oncpu will be touched. That's why I think the proposed driver API should be acceptable. > Btw, Xiaoyao has made it work with perf_event_disable_local, and don’t have that many changes. > If necessary, we can post the 2nd version out to double check. > I'm not worry about which ways (either perf_event_disable_local() or the proposed PT driver API) are chosen to stop the PT. If the existing perf_event interfaces can meet your requirement, that's perfect. My real concern is the pt_save_msr()/pt_load_msr(). I don't think it's a job for KVM. See atomic_switch_perf_msrs(). It is the perf core driver rather than KVM that tells which MSRs should be saved/restored in VMCS. We should do the same thing for PT. (Actually, I think we already encounter issues with the current KVM-dominated method. KVM saves/restores unnecessary MSRs. Right?) To do so, I think there may be two ways. - Since MSRs have to be switched for both PT and core drivers, it sounds reasonable to provide a new generic interface in the perf_event. The new interface is to tell KVM which MSRs should be saved/restored. Then KVM can decide to save/restore via VMCS or direct MSR access. I suspect this way requires big change, but it will benefit all the drivers which have similar requirements. - The proposed driver API. The MSRs are saved/restored in the PT driver. Thanks, Kan
On Friday, September 16, 2022 9:27 PM, Liang, Kan wrote: > > Did you mean to handle the PT event in the proposed driver API? Event > > status is just one of the things. There are other things if we want to > > make it complete for this, e.g. event->oncpu = -1, and eventually seems we will > re-implement perf_event_disable_*. > > > > As my understand, perf always check the status first. If it's a stopped or > inactivated event, I don't think event->oncpu will be touched. That's why I think > the proposed driver API should be acceptable. That's the implementation thing. We need to make it architecturally clean though. > > > Btw, Xiaoyao has made it work with perf_event_disable_local, and don’t have > that many changes. > > If necessary, we can post the 2nd version out to double check. > > > > I'm not worry about which ways (either perf_event_disable_local() or the > proposed PT driver API) are chosen to stop the PT. If the existing perf_event > interfaces can meet your requirement, that's perfect. > > My real concern is the pt_save_msr()/pt_load_msr(). I don't think it's a job for > KVM. See atomic_switch_perf_msrs(). It is the perf core driver rather than KVM > that tells which MSRs should be saved/restored in VMCS. > We should do the same thing for PT. (Actually, I think we already encounter > issues with the current KVM-dominated method. KVM saves/restores > unnecessary MSRs. Right?) > Right. It's on my plan to improve the current PT virtualization, and planed to be the next step after this fix. The general rule is the same: make KVM a user of perf, that is, we leave those save/restore work to be completely done by the perf (driver) side, so we will eventually remove the KVM side pt_save/load_msr. To be more precise, it will work as below: - we will create a guest event, like what we did for lbr virtualization - on VMEnter: -- perf_disable_event_local(host_event); -- perf_enable_event_local(guest_event); - on VMExit: -- perf_disable_event_local(guest_event); -- perf_enable_event_local(host_event); > To do so, I think there may be two ways. > - Since MSRs have to be switched for both PT and core drivers, it sounds > reasonable to provide a new generic interface in the perf_event. The new > interface is to tell KVM which MSRs should be saved/restored. Then KVM can > decide to save/restore via VMCS or direct MSR access. I suspect this way > requires big change, but it will benefit all the drivers which have similar > requirements. > - The proposed driver API. The MSRs are saved/restored in the PT driver. As shown above, no need for those. We can completely reuse the perf side save/restore. Thanks, Wei
On 2022-09-19 9:46 a.m., Wang, Wei W wrote: > On Friday, September 16, 2022 9:27 PM, Liang, Kan wrote: >>> Did you mean to handle the PT event in the proposed driver API? Event >>> status is just one of the things. There are other things if we want to >>> make it complete for this, e.g. event->oncpu = -1, and eventually seems we will >> re-implement perf_event_disable_*. >>> >> >> As my understand, perf always check the status first. If it's a stopped or >> inactivated event, I don't think event->oncpu will be touched. That's why I think >> the proposed driver API should be acceptable. > > That's the implementation thing. We need to make it architecturally clean though. > >> >>> Btw, Xiaoyao has made it work with perf_event_disable_local, and don’t have >> that many changes. >>> If necessary, we can post the 2nd version out to double check. >>> >> >> I'm not worry about which ways (either perf_event_disable_local() or the >> proposed PT driver API) are chosen to stop the PT. If the existing perf_event >> interfaces can meet your requirement, that's perfect. >> >> My real concern is the pt_save_msr()/pt_load_msr(). I don't think it's a job for >> KVM. See atomic_switch_perf_msrs(). It is the perf core driver rather than KVM >> that tells which MSRs should be saved/restored in VMCS. >> We should do the same thing for PT. (Actually, I think we already encounter >> issues with the current KVM-dominated method. KVM saves/restores >> unnecessary MSRs. Right?) >> > > Right. It's on my plan to improve the current PT virtualization, and > planed to be the next step after this fix. The general rule is the same: make KVM a user > of perf, that is, we leave those save/restore work to be completely done by the > perf (driver) side, so we will eventually remove the KVM side pt_save/load_msr. > To be more precise, it will work as below: > - we will create a guest event, like what we did for lbr virtualization Another fake event? We have to specially handle it in the perf code. I don't think it's a clean way for perf. > - on VMEnter: > -- perf_disable_event_local(host_event); > -- perf_enable_event_local(guest_event); > - on VMExit: > -- perf_disable_event_local(guest_event); > -- perf_enable_event_local(host_event); Why we cannot use the same way as the perf core driver to switch the MSRs in the VMCS? You just need one generic function, perf_guest_get_msrs(), for both PT and core driver. If you have to disable PT explicitly before VMCS, I think you can do it in the PT specific perf_guest_get_msrs(). Anyway, that's an improvement for the current code. I don't have a problem, if you prefer to separate the fix patch and improvement patch. Thanks, Kan > >> To do so, I think there may be two ways. >> - Since MSRs have to be switched for both PT and core drivers, it sounds >> reasonable to provide a new generic interface in the perf_event. The new >> interface is to tell KVM which MSRs should be saved/restored. Then KVM can >> decide to save/restore via VMCS or direct MSR access. I suspect this way >> requires big change, but it will benefit all the drivers which have similar >> requirements. >> - The proposed driver API. The MSRs are saved/restored in the PT driver. > > As shown above, no need for those. We can completely reuse the > perf side save/restore. > > Thanks, > Wei
On Monday, September 19, 2022 10:41 PM, Liang, Kan wrote: > Another fake event? We have to specially handle it in the perf code. I don't > think it's a clean way for perf. We can check the patch later. I think it should be clean, like the LBR side. > > > - on VMEnter: > > -- perf_disable_event_local(host_event); > > -- perf_enable_event_local(guest_event); > > - on VMExit: > > -- perf_disable_event_local(guest_event); > > -- perf_enable_event_local(host_event); > > Why we cannot use the same way as the perf core driver to switch the MSRs in > the VMCS? The current MSR switching list from VMCS isn’t fast, should be the last resort when really necessary. > > You just need one generic function, perf_guest_get_msrs(), for both PT and > core driver. If you have to disable PT explicitly before VMCS, I think you can do > it in the PT specific perf_guest_get_msrs(). The disable is done via " Clear IA32_RTIT_CTL" VMExit control. It can ensure PT disabled in time on VMExit, so no need to go through perf_guest_get_msrs. Thanks, Wei
On 2022-09-19 11:22 a.m., Wang, Wei W wrote: > On Monday, September 19, 2022 10:41 PM, Liang, Kan wrote: >> Another fake event? We have to specially handle it in the perf code. I don't >> think it's a clean way for perf. > > We can check the patch later. I think it should be clean, like the LBR side. I doubt. Perf already specially handles the fake LBR event in several places from the core code to the LBR code. It also occupy a reserved bit. If there is another choice, I don't think we want to go that way. > >> >>> - on VMEnter: >>> -- perf_disable_event_local(host_event); >>> -- perf_enable_event_local(guest_event); >>> - on VMExit: >>> -- perf_disable_event_local(guest_event); >>> -- perf_enable_event_local(host_event); >> >> Why we cannot use the same way as the perf core driver to switch the MSRs in >> the VMCS? > > The current MSR switching list from VMCS isn’t fast, > should be the last resort when really necessary. > It's a documented way in the SDM. I believe there must be some reason Intel introduces it. Since it's an documented (or recommended) way, I think we'd better use it if possible. Since both the PT and the core driver needs to switch MSRs during VMCS, it's better to use the same way (function) to handle them. Thanks, Kan >> >> You just need one generic function, perf_guest_get_msrs(), for both PT and >> core driver. If you have to disable PT explicitly before VMCS, I think you can do >> it in the PT specific perf_guest_get_msrs(). > > The disable is done via " Clear IA32_RTIT_CTL" VMExit control. > It can ensure PT disabled in time on VMExit, so no need to go through perf_guest_get_msrs. >
© 2016 - 2026 Red Hat, Inc.