On Fri, Aug 17, 2018 at 10:04 PM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
>> It's not necessary to send RDMA_CONTROL_ERROR when clean up rdma resource.
>> If rdma->error_state is ture, the message may not send successfully.
>> and the cm event can also notify the peer qemu.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>
> How does this keep 'cancel' working; I added 32bce196344 last year to
> make that code also send the RDMA_CONTROL_ERROR in 'cancelling'.
I guess send the RDMA_CONTROL_ERROR is to notify peer qemu close rdma
connection,
and to make sure the receive rdma_disconnect event.
But the two sides qemu should cleanup rdma independently. maybe the
destination qemu is hang.
1.the current qemu version already not wait for
RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect,
2.for peer qemu, it's already poll rdma->channel->fd, compare to send
RDMA_CONTROL_ERROR, maybe use cm event
to notify peer qemu to quit is better. maybe the rdma is already in
error_state, and RDMA_CONTROL_ERROR
cannot send successfully. when cancel migraiton, the destination will
receive RDMA_CM_EVENT_DISCONNECTED.
3. I prefer use RDMA_CONTROL_ERROR to notify some code logic error,
not rdma connection error.
>
> Dave
>
>> ---
>> migration/rdma.c | 11 -----------
>> 1 file changed, 11 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index ae07515..e1498f2 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2305,17 +2305,6 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>> int idx;
>>
>> if (rdma->cm_id && rdma->connected) {
>> - if ((rdma->error_state ||
>> - migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
>> - !rdma->received_error) {
>> - RDMAControlHeader head = { .len = 0,
>> - .type = RDMA_CONTROL_ERROR,
>> - .repeat = 1,
>> - };
>> - error_report("Early error. Sending error.");
>> - qemu_rdma_post_send_control(rdma, NULL, &head);
>> - }
>> -
>> rdma_disconnect(rdma->cm_id);
>> trace_qemu_rdma_cleanup_disconnect();
>> rdma->connected = false;
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK