Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Power8 : sosreport hangs #1

Closed
bssrikanth opened this issue Oct 12, 2017 · 8 comments
Closed

Power8 : sosreport hangs #1

bssrikanth opened this issue Oct 12, 2017 · 8 comments

Comments

@bssrikanth
Copy link

bssrikanth commented Oct 12, 2017

cde:info Mirrored with LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=160017 </cde:info>

[root@ltc-test-ci2 srikanth]# sosreport

sosreport (version 3.4)

This command will collect diagnostic and configuration information from
this CentOS Linux system and installed applications.

An archive containing the collected information will be generated in
/var/tmp/sos.7PVwZi and may be provided to a CentOS support
representative.

Any information provided to CentOS will be treated in accordance with
the published support policies at:

https://wiki.centos.org/

The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.

No changes will be made to system configuration.

Press ENTER to continue, or CTRL-C to quit.

Please enter your first initial and last name [ltc-test-ci2]:
Please enter the case id that you are generating this report for []:

Setting up archive ...
Setting up plugins ...
Running plugins. Please wait ...

Running 16/86: docker... [plugin:docker] command 'docker stats --no-stream' timed out after 300s
Running 86/86: yum...

^C
^C
^CTraceback (most recent call last):
File "/usr/sbin/sosreport", line 25, in
main(sys.argv[1:])
File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1632, in main
sos.execute()
File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1618, in execute
self.archive.cleanup()
File "/usr/lib/python2.7/site-packages/sos/archive.py", line 265, in cleanup
shutil.rmtree(self._archive_root)
File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib64/python2.7/shutil.py", line 250, in rmtree
os.remove(fullname)
KeyboardInterrupt
[root@ltc-test-ci2 srikanth]#
[root@ltc-test-ci2 srikanth]# sosreport -n yum,docker

sosreport (version 3.4)

This command will collect diagnostic and configuration information from
this CentOS Linux system and installed applications.

An archive containing the collected information will be generated in
/var/tmp/sos.FaQPrj and may be provided to a CentOS support
representative.

Any information provided to CentOS will be treated in accordance with
the published support policies at:

https://wiki.centos.org/

The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.

No changes will be made to system configuration.

Press ENTER to continue, or CTRL-C to quit.

Please enter your first initial and last name [ltc-test-ci2]:
Please enter the case id that you are generating this report for []:

Setting up archive ...
Setting up plugins ...

Running plugins. Please wait ...

Running 1/84: acpid...
Running 2/84: anaconda...
Running 3/84: anacron...
Running 4/84: auditd...
Running 5/84: block...
Running 6/84: boot...
Running 7/84: ceph...
Running 8/84: cgroups...
Running 9/84: chrony...
Running 10/84: cron...
Running 11/84: crypto...
Running 12/84: dbus...
Running 13/84: devicemapper...
Running 14/84: devices...
Running 15/84: dmraid...
Running 16/84: dracut...
Running 17/84: etcd...
Running 18/84: filesys...
Running 19/84: firewalld...
Running 20/84: gdm...
Running 21/84: general...
Running 22/84: grub2...
Running 23/84: hardware...
Running 24/84: hts...
Running 25/84: i18n...
Running 26/84: iprconfig...
Running 27/84: iscsi...
Running 28/84: jars...
Running 29/84: java...
Running 30/84: kdump...
Running 31/84: kernel...
Running 32/84: keyutils...
Running 33/84: krb5...
Running 34/84: kubernetes...
Running 35/84: kvm...
Running 36/84: last...
Running 37/84: ldap...
Running 38/84: libraries...
Running 39/84: libvirt...
Running 40/84: logrotate...
Running 41/84: logs...
Running 42/84: lsbrelease...
Running 43/84: lvm2...
Running 44/84: md...
Running 45/84: megacli...
Running 46/84: memory...
Running 47/84: mrggrid...
Running 48/84: mrgmessg...
Running 49/84: multipath...
Running 50/84: networking...
Running 51/84: nfs...
Running 52/84: nis...
Running 53/84: ntb...
Running 54/84: numa...
Running 55/84: openhpi...
Running 56/84: openshift...
Running 57/84: openssl...
Running 58/84: pam...
Running 59/84: pci...
Running 60/84: postfix...
Running 61/84: powerpc...
Running 62/84: process...
Running 63/84: processor...
Running 64/84: puppet...
Running 65/84: python...
Running 66/84: rpm...
Running 67/84: scsi...
Running 68/84: selinux...
Running 69/84: services...
Running 70/84: soundcard...
Running 71/84: ssh...
Running 72/84: system...
Running 73/84: systemd...
Running 74/84: systemtap...
Running 75/84: sysvipc...
Running 76/84: teamd...
Running 77/84: tuned...
Running 78/84: udev...
Running 79/84: usb...
Running 80/84: vhostmd...
Running 81/84: virsh...
Running 82/84: x11...
Running 83/84: xen...
Running 84/84: xfs...

^C

^C
^CTraceback (most recent call last):
File "/usr/sbin/sosreport", line 25, in
main(sys.argv[1:])
File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1632, in main
sos.execute()
File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1618, in execute
self.archive.cleanup()
File "/usr/lib/python2.7/site-packages/sos/archive.py", line 265, in cleanup
shutil.rmtree(self._archive_root)
File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
rmtree(fullname, ignore_errors, onerror)
File "/usr/lib64/python2.7/shutil.py", line 237, in rmtree
names = os.listdir(path)
KeyboardInterrupt

@bssrikanth bssrikanth changed the title sosreport hangs Power8 : sosreport hangs Oct 12, 2017
@bssrikanth
Copy link
Author

Sosreport on P9 seems to be working fine.. issue seen on P8 server..

@bssrikanth
Copy link
Author

Yesterday we I ran sosreport on P9 boston it had completed with no issues.. but today on Power 9 running sosreport reboots host every time.. we have been able to recreate it thrice now...

[root@ltc-boston128 ~]# sosreport -n docker

sosreport (version 3.4)

This command will collect diagnostic and configuration information from
this CentOS Linux system and installed applications.

An archive containing the collected information will be generated in
/var/tmp/sos.kU9xx9 and may be provided to a CentOS support
representative.

Any information provided to CentOS will be treated in accordance with
the published support policies at:

https://wiki.centos.org/

The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.

No changes will be made to system configuration.

Press ENTER to continue, or CTRL-C to quit.

Please enter your first initial and last name [ltc-boston128.aus.stglabs.ibm.com]:
Please enter the case id that you are generating this report for []:

Setting up archive ...
Setting up plugins ...

Running plugins. Please wait ...

Running 1/89: acpid...
Running 2/89: anaconda...
Running 3/89: anacron...
Running 4/89: auditd...
Running 5/89: block...
Running 6/89: boot...
Running 7/89: ceph...
Running 8/89: cgroups...
Running 9/89: chrony...
Running 10/89: cron...
Running 11/89: crypto...
Running 12/89: dbus...
Running 13/89: devicemapper...
Running 14/89: devices...
Running 15/89: dmraid...
Running 16/89: dracut...
Running 17/89: etcd...
Running 18/89: filesys...
Running 19/89: firewalld...
Running 20/89: gdm...
Running 21/89: general...
Running 22/89: grub2...
Running 23/89: hardware...
Running 24/89: hts...
Running 25/89: i18n...
Running 26/89: ipmitool...
Running 27/89: iprconfig...
Running 28/89: iscsi...
Running 29/89: jars...
Running 30/89: java...
Running 31/89: kdump...
Running 32/89: kernel...
Running 33/89: keyutils...
Running 34/89: krb5...
Running 35/89: kubernetes...
Running 36/89: kvm...
Running 37/89: last...
Running 38/89: ldap...
Running 39/89: libraries...
Running 40/89: libvirt...
Running 41/89: logrotate...
Running 42/89: logs...
Running 43/89: lsbrelease...
Running 44/89: lvm2...
Running 45/89: md...
Running 46/89: megacli...
Running 47/89: memory...
Running 48/89: mrggrid...
Running 49/89: mrgmessg...
Running 50/89: multipath...
Running 51/89: networking...
Running 52/89: nfs...
Running 53/89: nis...
Running 54/89: ntb...
Running 55/89: numa...
Running 56/89: openhpi...
Running 57/89: openshift...
Running 58/89: openssl...
Running 59/89: openswan...
Running 60/89: pam...
Running 61/89: pci...
Running 62/89: postfix...
Running 63/89: powerpc...

.... > hangs in here... below are the log messages when host reboots:

Oct 13 13:42:54 ltc-boston128 dracut: dracut-
Oct 13 13:42:54 ltc-boston128 dracut: Disabling early microcode for ppc64le
Oct 13 13:42:54 ltc-boston128 dracut: Executing: /usr/sbin/dracut --list-modules
Oct 13 13:42:58 ltc-boston128 systemd: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 9619 (df)
Oct 13 13:42:58 ltc-boston128 systemd: Mounting Arbitrary Executable File Formats File System...
Oct 13 13:42:58 ltc-boston128 systemd: Mounted Arbitrary Executable File Formats File System.
[ 8098.169742] nr_pdflush_threads exported in /proc is scheduled for removal
Oct 13 13:43:16 ltc-boston128 kernel: nr_pdflush_threads exported in /proc is scheduled for removal
[ 8100.364466] DCCP: Activated CCID 2 (TCP-like)
[ 8100.364591] DCCP: Activated CCID 3 (TCP-Friendly Rate Control)
Oct 13 13:43:19 ltc-boston128 kernel: DCCP: Activated CCID 2 (TCP-like)
Oct 13 13:43:19 ltc-boston128 kernel: DCCP: Activated CCID 3 (TCP-Friendly Rate Control)
[ 8100.439599] sctp: Hash tables configured (bind 8192/8192)
Oct 13 13:43:19 ltc-boston128 kernel: sctp: Hash tables configured (bind 8192/8192)
[ 8104.155600] mpt3sas 0001:03:00.0: invalid short VPD tag 00 at offset 1
[ 8104.166702] mpt3sas 0002:01:00.0: invalid short VPD tag 00 at offset 1
Oct 13 13:43:22 ltc-boston128 kernel: mpt3sas 0001:03:00.0: invalid short VPD tag 00 at offset 1
Oct 13 13:43:22 ltc-boston128 kernel: mpt3sas 0002:01:00.0: invalid short VPD tag 00 at offset 1
[ 8104.176810] mpt3sas 0031:01:00.0: invalid short VPD tag 00 at offset 1
[ 8104.180763] mpt3sas 0032:03:00.0: invalid short VPD tag 00 at offset 1
Oct 13 13:43:22 ltc-boston128 kernel: mpt3sas 0031:01:00.0: invalid short VPD tag 00 at offset 1
Oct 13 13:43:22 ltc-boston128 kernel: mpt3sas 0032:03:00.0: invalid short VPD tag 00 at offset 1

--== Welcome to Hostboot ==--

2.66486|secure|SecureROM valid - enabling functionality
14.83985|Ignoring boot flags, incorrect version 0x0
14.98273|ISTEP 6. 5 - host_init_fsi
15.03230|ISTEP 6. 6 - host_set_ipl_parms
15.17070|ISTEP 6. 7 - host_discover_targets
15.42736|HWAS|PRESENT> DIMM[03]=FFFF000000000000
15.42737|HWAS|PRESENT> Proc[05]=C000000000000000
15.44399|ISTEP 6. 8 - host_update_master_tpm
15.45060|SECURE|Security Access Bit> 0x0000000000000000
15.45061|SECURE|Secure Mode Disable (via Jumper)> 0xC000000000000000
15.45071|ISTEP 6. 9 - host_gard
16.14555|================================================
16.14830|Error reported by prdf (0xE500) PLID 0x9000000B
16.14831| PRD Signature : 0x70007 0xDD3F0034
16.14944| UserData1 : 0x0007000700000101
16.14944| UserData2 : 0xdd3f003400000000
16.14945|------------------------------------------------
16.14945| Callout type : Hardware Callout
16.14946| CPU id : 27
16.14947| Target : Physical:/Sys0/Node0/Proc0/EQ1/EX1/Core1
16.14947| Deconfig State : DELAYED_DECONFIG
16.14948| GARD Error Type : GARD_Fatal
16.14948| Priority : SRCI_PRIORITY_MED
16.14949|------------------------------------------------
16.14949|
16.14950|------------------------------------------------
16.14950| System checkstop occurred during runtime on previous boot
16.14951|------------------------------------------------
16.14951| �
16.14951|------------------------------------------------
16.14952| Hostboot Build ID:
16.14952|================================================
16.36976|================================================
16.36977|Error reported by hwas (0x0C00) PLID 0x9000000C
16.37092|System Shutting DownTo Perform Reconfiguration After Deconfig
16.37167|IPMI: Initiate power cycle
16.37205| Attempt to create a GARD Record for a target that is not GARDable (not DECONFIG_GARDABLE or not present)
16.37206| ModuleId 0x81 HWAS::MOD_PLAT_DECONFIG_GARD
16.37206| ReasonCode 0x0c81 HWAS::RC_TARGET_NOT_GARDABLE
16.37207| UserData1 HUID of input target // GARD errlog EID : 0x000700079000000b
16.37208| UserData2 ATTR_DECONFIG_GARDABLE // ATTR_HWAS_STATE.present : 0x0000000000000001
16.37209|------------------------------------------------
16.37209| Callout type : Procedure Callout
16.37210| Procedure : EPUB_PRC_HB_CODE
16.37210| Priority : SRCI_PRIORITY_HIGH
16.37211|------------------------------------------------
16.37211| Hostboot Build ID:
16.37212|================================================
16.38785|Stopping istep dispatcher
31.96819|IPMI: shutdown complete

@bssrikanth
Copy link
Author

Update: Vasant helped to narrow down issue to sosreport onlyplugin powerpc

@cdeadmin
Copy link

------- Comment From [email protected] 2017-10-27 15:27:59 EDT-------
(In reply to comment #6)

> Update: Vasant helped to narrow down issue to sosreport onlyplugin powerpc

Srikanth, anything more on this? Were there next steps from Vasant?

@cdeadmin
Copy link

------- Comment From [email protected] 2017-10-30 04:40:10 EDT-------
(In reply to comment #7)
> (In reply to comment #6)
>
> > Update: Vasant helped to narrow down issue to sosreport onlyplugin powerpc
>
> Srikanth, anything more on this? Were there next steps from Vasant?

Nothing more as of now. I have requested Vasant to have a look into this issue.

@cdeadmin
Copy link

------- Comment From [email protected] 2017-10-31 06:41:13 EDT-------
reducing severity since we have workaround

@cdeadmin
Copy link

cdeadmin commented Dec 5, 2017

------- Comment From [email protected] 2017-12-05 06:01:34 EDT-------
On the latest hostos build I am not able to recreate this issue.

@cdeadmin cdeadmin closed this as completed Dec 5, 2017
@cdeadmin
Copy link

cdeadmin commented Dec 5, 2017

------- Comment From [email protected] 2017-12-05 07:24:43 EDT-------
(In reply to comment #10)
> On the latest hostos build I am not able to recreate this issue.

Tested it on below levels:

[root@bos212lpar ~]# rpm -qa | grep sos
sos-3.4-6.el7.centos.noarch
[root@bos212lpar ~]# uname -r
4.14.0-1.rel.git68b4afb.el7.centos.ppc64le

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants