Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HostOS] [4.17] Migration fails with "error: unable to connect to server at 'ltc-fire4:49152': Connection timed out" #17

Closed
balamuruhans opened this issue Jun 30, 2018 · 7 comments

Comments

@balamuruhans
Copy link

balamuruhans commented Jun 30, 2018

cde:info Mirrored with LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=169385 </cde:info>

Migration fails Libvirt fails to resolve the hostname even though IP address is used in the migration commandline, additionally DNS is configured appropriately in resolv.conf where Libvirt even fails to retrieve it.

Migration fails with "error: unable to connect to server at 'ltc-fire4:49152': Connection timed out"

09:13:22 INFO | Command /bin/virsh -c 'qemu+ssh://9.47.70.203/system' migrate --live --persistent --postcopy --postcopy-after-precopy --timeout 3600 --domain vms1 --desturi qemu:///system running on a thread
09:15:34 DEBUG| [stderr] error: unable to connect to server at 'ltc-fire4:49152': Connection timed out
09:15:34 DEBUG| [stdout]
09:15:34 INFO | Command '/bin/virsh -c 'qemu+ssh://9.47.70.203/system' migrate --live --persistent --postcopy --postcopy-after-precopy --timeout 3600 --domain vms1 --desturi qemu:///system' finished with 1 after 132.27442193s
09:15:34 ERROR| Migration to qemu:///system failed

/var/log/messages in the target host have the log of it

Jun 18 23:17:53 pkvmhab008 systemd-logind: Removed session 5652.
Jun 18 23:19:48 pkvmhab008 libvirtd: 2018-06-19 03:19:48.060+0000: 60320: error : virNetSocketNewConnectTCP:591 : unable to connect to server at 'ltc-fire4:49152': Connection timed out
Jun 18 23:19:48 pkvmhab008 systemd-logind: Removed session 5650.
Jun 18 23:19:49 pkvmhab008 systemd-logind: New session 5653 of user root.
Jun 18 23:19:49 pkvmhab008 systemd: Started Session 5653 of user root.
Jun 18 23:19:49 pkvmhab008 systemd: Starting Session 5653 of user root.
Jun 18 23:19:49 pkvmhab008 systemd-logind: New session 5655 of user root.
Jun 18 23:19:49 pkvmhab008 systemd: Started Session 5655 of user root.
Jun 18 23:19:49 pkvmhab008 systemd: Starting Session 5655 of user root.
Jun 18 23:19:49 pkvmhab008 systemd-logind: New session 5654 of user root.
Jun 18 23:19:49 pkvmhab008 systemd: Started Session 5654 of user root.
Jun 18 23:19:49 pkvmhab008 systemd: Starting Session 5654 of user root.
Jun 18 23:19:50 pkvmhab008 libvirtd: 2018-06-19 03:19:50.279+0000: 60320: error : virCPUCheckFeature:695 : this function is not supported by the connection driver: cannot check guest CPU feature for ppc64le architecture
Jun 18 23:19:50 pkvmhab008 NetworkManager[4636]: <info>  [1529378390.3375] manager: (vnet2): new Tun device (/org/freedesktop/NetworkManager/Devices/20328)

System configuration:

Host Kernel: 4.17.0-1.dev.git5ce3eac.el7.ppc64le
Guest Kernel: 4.17.0-1.dev.git5ce3eac.el7.ppc64le
Qemu: qemu-2.12.0-2.dev.gitd36f3ee.el7.ppc64le
SLOF: SLOF-20171214-2.dev.gitc2a331f.el7.centos.noarch
Libvirt: libvirt-4.3.0-1.dev.git3096ff1.el7.ppc64le
@balamuruhans
Copy link
Author

target var/log/message during failure:
messages.log

@balamuruhans
Copy link
Author

Resolving the IP address was successful only after adding the hostname and its IP address mapped in /etc/hosts but virsh could not resolve automatically it through DNS.

error: Unable to resolve address 'ltc-fire4' service '49152': Name or service not known`

But this connection timeout error occurred after the entry for hostname IP mapped in /etc/hosts and the MultiVM migration with stress was triggered in parallel threads,

Test setup:

  1. Install and Boot 3 VMs (vms1, vms2, vms3) in the environment (all 3 vms xml are attached)
  2. In all the VMs stress tool is installed by downloaded from http://people.seas.harvard.edu/~apw/stress/stress-1.0.4.tar.gz and cd /home/stress-1.0.4/;./configure && make install
  3. CPU stress is ran inside the VMs,
    # nohup stress --cpu 4 --quiet --timeout 3600 &
  4. Perform cross migration from source to target and target to source at a time,

2018-06-19 09:13:22,659 process L0633 INFO | Command /bin/virsh -c 'qemu+ssh://9.47.70.203/system' migrate --live --persistent --postcopy --postcopy-after-precopy --timeout 3600 --domain vms1 --desturi qemu:///system running on a thread
2018-06-19 09:15:34,810 process L0385 DEBUG| [stderr] error: unable to connect to server at 'ltc-fire4:49152': Connection timed out

2018-06-19 09:15:55,558 process L0642 INFO | Command '/bin/virsh -c 'qemu:///system' migrate --live --persistent --postcopy --postcopy-after-precopy --timeout 3600 --domain vms2 --desturi qemu+ssh://9.47.70.203/system' finished with 0 after 20.5368299484s
2018-06-19 09:17:48,091 virsh L0704 DEBUG| status: 0

2018-06-19 09:13:44,724 process L0642 INFO | Command '/bin/virsh -c 'qemu:///system' migrate --live --persistent --postcopy --postcopy-after-precopy --timeout 3600 --domain vms3 --desturi qemu+ssh://9.47.70.203/system' finished with 0 after 22.0678019524s
2018-06-19 09:13:44,726 virsh L0704 DEBUG| status: 0
20

@balamuruhans
Copy link
Author

libvirtd debug log:

libvirtd.tar.gz

@cdeadmin
Copy link

cdeadmin commented Jul 3, 2018

------- Comment From [email protected] 2018-07-03 06:41:00 EDT-------
(In reply to comment #6)

> here I have two issues,
> 1. IP was not resolved by virsh using DNS.(got it working after adding the
> hostname - IP mapped in /etc/hosts)
>
> error: Unable to resolve address 'ltc-fire4' service '49152': Name or
> service not known`
>

source and target /etc/resollv.conf doesn't have 'aus.stglabs.ibm.com'
search domain added and thus they fail to resolve hostname.
Add below line in /etc/resolv.conf for DNS to work.

search aus.stglabs.ibm.com

------- Comment From [email protected] 2018-07-03 06:42:11 EDT-------
(In reply to comment #7)
> (In reply to comment #6)
>
> > here I have two issues,
> > 1. IP was not resolved by virsh using DNS.(got it working after adding the
> > hostname - IP mapped in /etc/hosts)
> >
> > error: Unable to resolve address 'ltc-fire4' service '49152': Name or
> > service not known`
> >
>
> source and target /etc/resollv.conf doesn't have 'aus.stglabs.ibm.com'
> search domain added and thus they fail to resolve hostname.
> Add below line in /etc/resolv.conf for DNS to work.
>
>
> search aus.stglabs.ibm.com

root@pkvmhab008 ~]# cat /etc/resolv.conf

Generated by NetworkManager

search pok.stglabs.ibm.com
search aus.stglabs.ibm.com

nameserver 9.3.1.200
nameserver 9.0.130.50

[root@pkvmhab008 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
9.47.70.203 pkvmhab008.pok.stglabs.ibm.com

[root@pkvmhab008 ~]# ping -c 2 ltc-fire4
PING ltc-fire4.aus.stglabs.ibm.com (9.40.193.21) 56(84) bytes of data.
64 bytes from ltc-fire4.aus.stglabs.ibm.com (9.40.193.21): icmp_seq=1 ttl=51 time=52.8 ms
64 bytes from ltc-fire4.aus.stglabs.ibm.com (9.40.193.21): icmp_seq=2 ttl=51 time=89.6 ms

--- ltc-fire4.aus.stglabs.ibm.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 52.810/71.229/89.649/18.421 ms
[root@pkvmhab008 ~]#

------- Comment From [email protected] 2018-07-03 06:45:01 EDT-------
>>2. core issue is libvirt connection timeout when a VM from source -> target and another VM from target -> source is migrated at a time.

>> error: unable to connect to server at 'ltc-fire4:49152': Connection timed out

Let know if you are still seeing issue 2 with /etc/resolv.conf fixed and
ping test working fine before migration.

@cdeadmin
Copy link

cdeadmin commented Jul 3, 2018

------- Comment From [email protected] 2018-07-03 09:25:07 EDT-------
(In reply to comment #9)
> >>2. core issue is libvirt connection timeout when a VM from source -> target and another VM from target -> source is migrated at a time.
>
> >> error: unable to connect to server at 'ltc-fire4:49152': Connection timed out
>
> Let know if you are still seeing issue 2 with /etc/resolv.conf fixed and
> ping test working fine before migration.

I observed core issue 2 and submitted the Bugzilla only after ensuring virsh could able to resolve the IP and ping test is working fine before migration. Because the test will run multiple migration scenarios with same environment and it went fine.

I have reported the issue 1 to let know developers about it and get inputs whether it is how libvirt is designed or we have to consider it as a separate issue.

@cdeadmin
Copy link

cdeadmin commented Jul 4, 2018

------- Comment From [email protected] 2018-07-04 02:24:01 EDT-------
Test (In reply to comment #10)
> (In reply to comment #9)
> > >>2. core issue is libvirt connection timeout when a VM from source -> target and another VM from target -> source is migrated at a time.
> >
> > >> error: unable to connect to server at 'ltc-fire4:49152': Connection timed out
> >
> > Let know if you are still seeing issue 2 with /etc/resolv.conf fixed and
> > ping test working fine before migration.
>
> I observed core issue 2 and submitted the Bugzilla only after ensuring virsh
> could able to resolve the IP and ping test is working fine before migration.
> Because the test will run multiple migration scenarios with same environment
> and it went fine.
>
> I have reported the issue 1 to let know developers about it and get inputs
> whether it is how libvirt is designed or we have to consider it as a
> separate issue.

As commented before, /etc/resolv.conf wasn't configured correctly and thus
hostname didn't get resolved automatically. As a result test had to resort to
adding IP into /etc/hostname. So issue 1 is due to improper setup.

Hi Shiva,

Can you please look into it ?
/etc/resolv.conf was't setup properly and that's causing hostname to not get
resolved by DNS. As a workaround test added IP directly into hostname
and getting "Connection timed out error".

@cdeadmin
Copy link

------- Comment From [email protected] 2018-09-14 12:48:27 EDT-------
No longer making plans for future hostos-specific bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants