Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile collected in TF 2.12 not visible in TensorBoard #6338

Closed
whatdhack opened this issue Apr 20, 2023 · 13 comments
Closed

Profile collected in TF 2.12 not visible in TensorBoard #6338

whatdhack opened this issue Apr 20, 2023 · 13 comments

Comments

@whatdhack
Copy link

whatdhack commented Apr 20, 2023

Environment information (required)

Please run diagnose_tensorboard.py (link below) in the same
environment from which you normally run TensorFlow/TensorBoard, and
paste the output here:

https://raw.githubusercontent.com/tensorflow/tensorboard/master/tensorboard/tools/diagnose_tensorboard.py

Diagnostics

Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version 516a2f9433ba4f9c3a4fdb0f89735870eda054a1

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=9, micro=16, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='xxxx6', release='5.14.21-150400.24.46-default', version='#1 SMP PREEMPT_DYNAMIC Thu Feb 9 08:38:18 UTC 2023 (2d95137)', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: True
INFO: $VIRTUAL_ENV: None

--- check: installed_packages
INFO: installed: tensorboard==2.12.2
INFO: installed: tensorflow==2.12.0
INFO: installed: tensorflow-estimator==2.12.0
INFO: installed: tensorboard-data-server==0.7.0

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.12.2'

--- check: tensorflow_python_version
INFO: tensorflow.__version__: '2.12.0'
INFO: tensorflow.__git_version__: 'v2.12.0-rc1-12-g0db597d0d75'

--- check: tensorboard_data_server_version
INFO: data server binary: '//-fs0/users/ac./env/cuda5/lib/python3.9/site-packages/tensorboard_data_server/bin/server'
INFO: failed to check binary version: Command '['//-fs0/users/ac./env/cuda5/lib/python3.9/site-packages/tensorboard_data_server/bin/server', '--version']' returned non-zero exit status 1.

--- check: tensorboard_binary_path
INFO: which tensorboard: b'//-fs0/users/ac./env/cuda5/bin/tensorboard\n'

--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]

--- check: readable_fqdn
INFO: socket.getfqdn(): ''

--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=28180481, st_dev=2053, st_nlink=2, st_uid=10029, st_gid=4001, st_size=4096, st_atime=1681960298, st_mtime=1681960468, st_ctime=1681960468)
INFO: mode: 0o40777

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['//-fs0/users/ac./env/cuda5/lib/python3.9/site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==1.4.0
astunparse==1.6.3
cachetools==5.3.0
certifi==2022.12.7
charset-normalizer==3.1.0
flatbuffers==23.3.3
gast==0.4.0
google-auth==2.17.3
google-auth-oauthlib==1.0.0
google-pasta==0.2.0
grpcio==1.54.0
gviz-api==1.10.0
h5py==3.8.0
idna==3.4
importlib-metadata==6.5.0
jax==0.4.8
keras==2.12.0
libclang==16.0.0
Markdown==3.4.3
MarkupSafe==2.1.2
ml-dtypes==0.1.0
numpy==1.23.5
nvidia-cublas-cu11==2022.4.8
nvidia-cublas-cu117==11.10.1.25
nvidia-cudnn-cu11==8.6.0.163
oauthlib==3.2.2
opt-einsum==3.3.0
packaging==23.1
pip==23.1
protobuf==3.20.3
pyasn1==0.4.8
pyasn1-modules==0.2.8
requests==2.28.2
requests-oauthlib==1.3.1
rsa==4.9
scipy==1.10.1
setuptools==57.5.0
six==1.16.0
tensorboard==2.12.2
tensorboard-data-server==0.7.0
tensorboard-plugin-profile==2.11.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.12.0
tensorflow-estimator==2.12.0
tensorflow-io-gcs-filesystem==0.32.0
termcolor==2.2.0
typing_extensions==4.5.0
urllib3==1.26.15
Werkzeug==2.2.3
wheel==0.40.0
wrapt==1.14.1
zipp==3.15.0

Next steps

No action items identified. Please copy ALL of the above output,
including the lines containing only backticks, into your GitHub issue
or comment. Be sure to redact any sensitive information.

Issue description

Profile collected in TF 2.12 GPU is not visible in TensorBoard.

@whatdhack whatdhack changed the title Profiler collected in TF 2.12 not visible in TensorBoard Profile collected in TF 2.12 not visible in TensorBoard Apr 20, 2023
@rileyajones
Copy link
Contributor

Can you post your TensorBoard logs? In do they contain anything like this?

Failed to load plugin ProfilePluginLoader.load; ignoring it.
[...]
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.

@whatdhack
Copy link
Author

@rileyajones , that error was there with protobuf==4.20.3 . After downgrading protobuf to 3.20.3 that error was gone. However, only the following warnings were displayed and no profile was visible on the browser and had the message "No profile data was found.".

`

tensorboard --bind_all --port 6006 --logdir ./profiles
2023-04-20 21:02:11.892091: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-20 21:02:14.289558: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
tensorboard_data_server/bin/server: /lib64/libc.so.6: version GLIBC_2.33' not found (required by tensorboard_data_server/bin/server) tensorboard_data_server/bin/server: /lib64/libc.so.6: version GLIBC_2.34' not found (required by tensorboard_data_server/bin/server)
tensorboard_data_server/bin/server: /lib64/libc.so.6: version GLIBC_2.32' not found (required by tensorboard_data_server/bin/server) TensorBoard 2.12.2 at http://xxx:6006/ (Press CTRL+C to quit) W0420 21:02:33.002275 139692568106752 security_validator.py:60] In 3.0, this warning will become an error: Illegal Content-Security-Policy for script-src: 'unsafe-inline' Illegal Content-Security-Policy for script-src-elem: 'unsafe-inline'

@rileyajones
Copy link
Contributor

I believe your issue is actually with the profiler plugin https://github.com/tensorflow/profiler. Could you try installing their nightly version? https://pypi.org/project/tbp-nightly/

@whatdhack
Copy link
Author

whatdhack commented Apr 20, 2023

@rileyajones Did the following and hit another failure.

>pip uninstall tensorboard-plugin-profile
>pip install tbp-nightly
>pip list|grep -ie tbp -ie plug -ie tensor
tbp-nightly                  2.12.2a20230420
tensorboard                  2.12.2
tensorboard-data-server      0.7.0
tensorflow                   2.12.0
tensorflow-estimator         2.12.0
tensorflow-io-gcs-filesystem 0.32.0
> tensorboard --bind_all --port 6006 --logdir ./profiles
2023-04-20 21:53:29.886673: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-20 21:53:32.391731: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "bin/tensorboard", line 8, in <module>
    sys.exit(run_main())
  File "/lib/python3.9/site-packages/tensorboard/main.py", line 42, in run_main
    plugins=default.get_plugins(),
  File "lib/python3.9/site-packages/tensorboard/default.py", line 105, in get_plugins
    return get_static_plugins() + get_dynamic_plugins()
  File "lib/python3.9/site-packages/tensorboard/default.py", line 140, in get_dynamic_plugins
    return [
  File "lib/python3.9/site-packages/tensorboard/default.py", line 141, in <listcomp>
    entry_point.resolve()
  File "lib/python3.9/site-packages/pkg_resources/__init__.py", line 2456, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
ModuleNotFoundError: No module named 'tensorboard_plugin_profile.profile_plugin_loader'


@rileyajones
Copy link
Contributor

It seems like the process of installing their nightly is a bit more convoluted than just installing another pip package. Check their quick start section https://github.com/tensorflow/profiler#quick-start

@rdbis
Copy link

rdbis commented Apr 28, 2023

on my setup, installing nightly does not resolve the issue, still the same problem persist:
E0428 23:24:26.223354 140086757312320 application.py:125] Failed to load plugin ProfilePluginLoader.load; ignoring it.
Traceback (most recent call last):
File "/home/jozef/profiler/profile_env/lib/python3.11/site-packages/tensorboard/backend/application.py", line 123, in TensorBoardWSGIApp
plugin = loader.load(context)
^^^^^^^^^^^^^^^^^^^^
File "/home/jozef/profiler/profile_env/lib/python3.11/site-packages/tensorboard_plugin_profile/profile_plugin_loader.py", line 75, in load
from tensorboard_plugin_profile import profile_plugin
File "/home/jozef/profiler/profile_env/lib/python3.11/site-packages/tensorboard_plugin_profile/profile_plugin.py", line 36, in
from tensorboard_plugin_profile.convert import raw_to_tool_data as convert
File "/home/jozef/profiler/profile_env/lib/python3.11/site-packages/tensorboard_plugin_profile/convert/raw_to_tool_data.py", line 29, in
from tensorboard_plugin_profile.convert import input_pipeline_proto_to_gviz
File "/home/jozef/profiler/profile_env/lib/python3.11/site-packages/tensorboard_plugin_profile/convert/input_pipeline_proto_to_gviz.py", line 28, in
from tensorboard_plugin_profile.protobuf import input_pipeline_pb2
File "/home/jozef/profiler/profile_env/lib/python3.11/site-packages/tensorboard_plugin_profile/protobuf/input_pipeline_pb2.py", line 17, in
from tensorboard_plugin_profile.protobuf import diagnostics_pb2 as plugin_dot_tensorboard__plugin__profile_dot_protobuf_dot_diagnostics__pb2
File "/home/jozef/profiler/profile_env/lib/python3.11/site-packages/tensorboard_plugin_profile/protobuf/diagnostics_pb2.py", line 36, in
_descriptor.FieldDescriptor(
File "/home/jozef/profiler/profile_env/lib/python3.11/site-packages/google/protobuf/descriptor.py", line 561, in new
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

  1. Downgrade the protobuf package to 3.20.x or lower.
  2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

setting PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python or downgrading protobuf from 4.22.3 -> 3.20.3 does not help either: In this case the following error can be seen on the console:
W0428 20:31:10.595216 139654082852416 security_validator.py:60] In 3.0, this warning will become an error:
Illegal Content-Security-Policy for script-src: 'unsafe-inline'
Illegal Content-Security-Policy for script-src-elem: 'unsafe-inline'

Tensorboard profiler reports in the browser: No profile data was found.

@rileyajones
Copy link
Contributor

The issue here is essentially that the profiler plugin does not support the proto version being used. Given that the profiler plugin is part of another repo, I think it is time to open an issue there https://github.com/tensorflow/profiler/issues

@rdbis
Copy link

rdbis commented May 2, 2023

there is already one: tensorflow/profiler#609 "ProfilerPluginLoader fails due to protobuf versions #609"

@arcra
Copy link
Member

arcra commented Jun 3, 2023

FYI, Profiler recently produced a new release of the plugin, which should fix these compatibility issues. I'll close this for now.

@arcra arcra closed this as completed Jun 3, 2023
@rdbis
Copy link

rdbis commented Jun 3, 2023

Hi, it still does not work. Checked with:
tf-nightly 2.14.0.dev20230603
tb-nightly 2.14.0a20230603
tbp-nightly 2.14.0a20230603

in short, it looks like:

  • tf-nightly uses 'protobuf>=3.20.3,<5.0.0dev,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5',
  • tb-nightly uses protobuf >= 3.19.6
  • tbp-nightly uses 'protobuf >= 3.19.0',

and according to https://protobuf.dev/news/2022-05-06/#python-updates
starting from protobuf >= 3.20.1 (version used in tf-nightly) there are some breaking changes in protobuf. Both tb-nightly and tbp-nightly are using older protobuf versions.
Theoretically: "Python upb requires generated code that has been generated from protoc 3.19.0 or newer." so it should work, but for some reason is not.

pip list shows, protobuf 4.23.2 is installed
and here is the error log:
E0604 01:11:27.531507 140069775619008 application.py:125] Failed to load plugin ProfilePluginLoader.load; ignoring it.
Traceback (most recent call last):
File "/home/jozef/tbp/lib/python3.9/site-packages/tensorboard/backend/application.py", line 123, in TensorBoardWSGIApp
plugin = loader.load(context)
File "/home/jozef/tbp/lib/python3.9/site-packages/tensorboard_plugin_profile/profile_plugin_loader.py", line 75, in load
from tensorboard_plugin_profile import profile_plugin
File "/home/jozef/tbp/lib/python3.9/site-packages/tensorboard_plugin_profile/profile_plugin.py", line 36, in
from tensorboard_plugin_profile.convert import raw_to_tool_data as convert
File "/home/jozef/tbp/lib/python3.9/site-packages/tensorboard_plugin_profile/convert/raw_to_tool_data.py", line 29, in
from tensorboard_plugin_profile.convert import input_pipeline_proto_to_gviz
File "/home/jozef/tbp/lib/python3.9/site-packages/tensorboard_plugin_profile/convert/input_pipeline_proto_to_gviz.py", line 28, in
from tensorboard_plugin_profile.protobuf import input_pipeline_pb2
File "/home/jozef/tbp/lib/python3.9/site-packages/tensorboard_plugin_profile/protobuf/input_pipeline_pb2.py", line 17, in
from tensorboard_plugin_profile.protobuf import diagnostics_pb2 as plugin_dot_tensorboard__plugin__profile_dot_protobuf_dot_diagnostics__pb2
File "/home/jozef/tbp/lib/python3.9/site-packages/tensorboard_plugin_profile/protobuf/diagnostics_pb2.py", line 36, in
_descriptor.FieldDescriptor(
File "/home/jozef/tbp/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 561, in new
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

  1. Downgrade the protobuf package to 3.20.x or lower.
  2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.14.0a20230603 at http://localhost:6006/ (Press CTRL+C to quit)

@arcra
Copy link
Member

arcra commented Jun 5, 2023

Hmmm... that seems to be the case. In any case, this should be an issue on the tensorflow/profiler repo.

The error message clearly shows the issue occurs while reading tensorboard_plugin_profile/protobuf/diagnostics_pb2.py file. It looks like despite what their requirements say for the protobuf library, their pip package was still generated with an older version. Please open an issue on their repo.

@rdbis
Copy link

rdbis commented Jun 5, 2023

agree, it's an error in profiler package, more info here: tensorflow/profiler#609 (comment)

@Mingrg
Copy link

Mingrg commented Jun 15, 2023

@rileyajones hi, my model is trained by tensorflow version 1.14, does it mean that I cannot see the PROFILE through the browser? Do I have to upgrade tensorflow version to >= 2.2.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants