Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logger: Create only one Context per process for mobile #38149

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

abeyad
Copy link
Contributor

@abeyad abeyad commented Jan 23, 2025

The Logger::Context operates as a stack. When a new one is created, it becomes the active Context. When it gets destroyed, the destructor pops the previous Context as the currently active one.

This can lead to lifetime bugs in Envoy Mobile, if multiple engines are created and the destruction of those engines lead to indeterminate destruction of the Logger::Context objects.

In this commit, the behavior is changed so that the Logger::Context initialization is done by the external caller, not inside StrippedMainBase as previously done. This allows the caller, in this case, Envoy Mobile, to ensure only one Logger::Context is ever created, circumventing any potential issues on Context destruction with popping the previously saved context.

The Logger::Context operates as a stack. When a new one is created, it
becomes the active Context. When it gets destroyed, the destructor pops
the previous Context as the currently active one.

This can lead to lifetime bugs in Envoy Mobile, if multiple engines are
created and the destruction of those engines lead to indeterminate
destruction of the Logger::Context objects.

In this commit, the behavior is changed so that the Logger::Context
initialization is done by the external caller, not inside
StrippedMainBase as previously done. This allows the caller, in this
case, Envoy Mobile, to ensure only one Logger::Context is ever created,
circumventing any potential issues on Context destruction with popping
the previously saved context.

Signed-off-by: Ali Beyad <[email protected]>
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #38149 was opened by abeyad.

see: more, trace.

Signed-off-by: Ali Beyad <[email protected]>
Signed-off-by: Ali Beyad <[email protected]>
Signed-off-by: Ali Beyad <[email protected]>
@abeyad
Copy link
Contributor Author

abeyad commented Jan 25, 2025

/retest

@abeyad abeyad marked this pull request as ready for review January 25, 2025 01:14
@abeyad
Copy link
Contributor Author

abeyad commented Jan 25, 2025

/assign @jmarantz

@abeyad
Copy link
Contributor Author

abeyad commented Jan 25, 2025

@jmarantz I also drafted #38188, which passes the tests too. I feel the code in this PR is cleaner, but it does have a bit more of a singleton paradigm in that StrippedMainBase still creates a Logger::Context but only if there isn't a currently active Context.

*options_, real_time_system_, default_listener_hooks_, prod_component_factory_,
std::make_unique<PlatformImpl>(), std::make_unique<Random::RandomGeneratorImpl>(), nullptr,
create_instance, /*set_new_handler=*/false);
std::unique_ptr<Random::RandomGeneratorImpl> random_generator =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is a good use-case for auto as the type is obvious from context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// This helps avoid issues on Logger::Context destruction when the previous saved context
// could be activated in a thread-unsafe manner.
void initOnceLoggerContext(const OptionsImplBase& options) {
static Thread::MutexBasicLockable* log_lock = new Thread::MutexBasicLockable();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: MUTABLE_CONSTRUCT_ON_FIRST_USE if we can make that work, just for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't know about this, thanks!

@@ -19,6 +21,17 @@ constexpr absl::Duration ENGINE_RUNNING_TIMEOUT = absl::Seconds(30);
// Google DNS address used for IPv6 probes.
constexpr absl::string_view IPV6_PROBE_ADDRESS = "2001:4860:4860::8888";
constexpr uint32_t IPV6_PROBE_PORT = 53;

// There is only one shared static Logger::Context instance for all Envoy Mobile engines.
// This helps avoid issues on Logger::Context destruction when the previous saved context
Copy link
Contributor

@jmarantz jmarantz Jan 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the essence of this problem that Envoy-Mobile does not have access to make changes in main()?

How does it handle things like ProcessWIde?

Note that ProcessWide was invented after this whole thing I did long ago with MainCommon and tbe logger context. Maybe we can lean on that more and clean some of this stuff up, so there's just one way to initalize all the things that need to be done once per process, rather than having Logger::Context be special.

What I'd like to see is most of this stuff is normally owned in main() or similar, without singletons for the most part. But if E-M is special cause it doesn't get any access to main, then there can be one ProcessWide singleton specific to E-M.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the essence of this problem that Envoy-Mobile does not have access to make changes in main()?

That's correct, EM does not use Envoy's main() and doesn't have access to it.

What I'd like to see is most of this stuff is normally owned in main() or similar, without singletons for the most part. But if E-M is special cause it doesn't get any access to main, then there can be one ProcessWide singleton specific to E-M.

ProcessWide seems to actually just be StrippedMainBase-wide, since the destructor for ProcessWide decrements the count and if the count is 0, the next time StrippedMainBase is created in the same process, ProcessWide will trigger LibEvent and nghttp2 initialization again.

Is that the intended behavior? I thought LibEvent would only need to be initialized once per-process.

The fact that this setup works with E-M (at least in the YouTube app) seems to me to be unintentional. What happens is, we create one Envoy engine, but if there is a config update, we create a new Envoy engine to replace the existing one. However, the new engine always gets created before the old one gets cleaned up, because the old one could still be in use by existing requests/open connections, hence InitData.count never falls to 0 and creating new Envoy engines doesn't result in re-initializing LibEvent and nghttp2.

Also, I just tested on my local device where I removed the count == 0 check in ProcessWide so that on every Envoy creation, LibEvent::initalize() and nghttp2 initialization are called again, and it caused no crashes or problems, so presumably that's another reason why the current ProcessWide would not be an issue even with multiple Envoy server instances.

We could potentially do the following:

  1. Create a subclass of ProcessWide for EM that, in addition to what ProcessWide does, also creates a Logger::Context.
  2. Instantiate the singleton EMProcessWide in EM before any of the Envoy server instances are created. That should make the ProcessWide in StrippedMainBase a no-op.
  3. Move the Logger::Context initialization from StrippedMainBase to MainCommonBase since EM doesn't use MainCommonBase (it uses StrippedMainBase directly).

WDYT? Is that what you had in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole thing is a kind of a mess due to incremental changes made by different people, not fully conscious of one another's efforts, as needs have evolved.

It might have been made more complex because of the way I implemented sink-delegates, which are useful for tests but I'm a bit skeptical now they can be safely used in production. Need to review the locking semantics for all of this. I feel like this needs to be all re-done.

But since you are trying to solve an active issue, I like the plan you outlined here in this comment, basically delegating to your ProcessWide subclass the creation of the one and only Logger::Context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarantz Unfortunately, what I described with MobileProcessWide subclassing ProcessWide won't work cleanly either for various reasons. I could go into details over GVC, but basically a lot of code uses ProcessWide for initialization so I can't change the API easily, and right now all the initialization with InitData happens in the constructor itself. If I try to add a helper function like init() to be called in the constructor inside if (init_data.count++ == 0), during construction of the base class, it will call the base class' init(), not the subclass' init(). I could add another helper class with an "init" function supplier, and have a default supplier and a mobile one, but that causes other issues when dealing with destruction because we never want to destroy the Logger::Context in EM (a true static) whereas ProcessWide does destroy when the InitData.count is 0. It also just seems to make the code even more complicated than it already is.

Instead, I just created a MobileProcessWide class that doesn't inherit from ProcessWide. It just does whatever global initialization EM specifically requires that ProcessWide doesn't provide. If EM needs any other things globally initialized, we can put it in MobileProcessWide.

WDYT?

Signed-off-by: Ali Beyad <[email protected]>
@jmarantz
Copy link
Contributor

/wait

Signed-off-by: Ali Beyad <[email protected]>
Signed-off-by: Ali Beyad <[email protected]>
Signed-off-by: Ali Beyad <[email protected]>
@abeyad
Copy link
Contributor Author

abeyad commented Jan 29, 2025

/retest

namespace Envoy {

MobileProcessWide::MobileProcessWide(const OptionsImplBase& options) {
logging_context_ = std::make_unique<Logger::Context>(options.logLevel(), options.logFormat(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have multiple Envoy Mobile engines, does that mean setting the log level will apply globally to all Envoy Mobile engines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the way Context works now (with a stack of Contexts), this was the only way to initialize logging and make it safe from data races in EM.

We probably need to rework how we do initialization of globals in general and Logger::Context in specific, but that's outside the scope of this PR.

@abeyad
Copy link
Contributor Author

abeyad commented Jan 29, 2025

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants