Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][client] Add schema cache to improve performance #23808

Closed
wants to merge 6 commits into from

Conversation

yunmaoQu
Copy link

@yunmaoQu yunmaoQu commented Jan 3, 2025

close #23707

Motivation

Schema creation (e.g., Schema.AVRO(SomeClass.class)) is fairly CPU intensive. It would be useful it there would be a weak reference cache for caching the schema instance.

Modifications

Add SchemaCache implementation using WeakHashMap for schema instance caching

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

N/A

Copy link

github-actions bot commented Jan 3, 2025

@yunmaoQu Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Jan 3, 2025
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several inconsistencies in this PR. For example, the class names and the class file names don't match. Please test this PR in your own fork first to ensure that it passes tests.
It seems that this PR contains a lot of features related to the schema caching. Instead of adding a lot of features, it would be better to keep the implementation to the minimum.
I'm surprised by competing implementations for implementing the schema cache. There's currently already an open PR #23777.

@yunmaoQu
Copy link
Author

yunmaoQu commented Jan 3, 2025

ok,i test all and could you review it and give me some suggestions

@lhotari
Copy link
Member

lhotari commented Jan 3, 2025

ok,i test all and could you review it and give me some suggestions

Instead of adding more code to test everything, please reduce to a minimal implementation. This means to remove features to track cache metrics. That's not something that is needed. For the cache implementation, I'd suggest using a ConcurrentMap created with Guava's MapMaker. Instead of adding yet another abstraction, I'd suggest modifying the PulsarClientImplementationBinding interface and adding a new interface method <T extends com.google.protobuf.GeneratedMessageV3> Schema<T> newProtobufSchema(Class<T> clazz). Then we could keep the cache as an implementation level detail.

example of minimal implementation for newProtobufSchema using Guava's MapMaker with weak keys:

    private static final ConcurrentMap<Class<?>, Schema<?>> PROTOBUF_CACHE = new MapMaker().weakKeys().makeMap();

    public <T extends com.google.protobuf.GeneratedMessageV3> Schema<T> newProtobufSchema(Class<T> clazz) {
        return (Schema<T>) PROTOBUF_CACHE.computeIfAbsent(clazz,
                k -> ProtobufSchema.of(SchemaDefinition.builder().withPojo(clazz).build())).clone();
    }

There shouldn't be a need to ever clear the cache since it's bounded by the number of classes with strong references. It won't consume a significant amount of memory in the first place.

@yunmaoQu
Copy link
Author

yunmaoQu commented Jan 3, 2025

OK.Should i implement it based on the pre commit or what?

@lhotari
Copy link
Member

lhotari commented Jan 3, 2025

OK.Should i implement it based on the pre commit or what?

That's something you can decide. Please read my previous message and draw your conclusions.

@walkinggo
Copy link

walkinggo commented Jan 4, 2025

It looks like we're working on similar tasks. I've already created a pull request #23777 to complete this task. Should we work together to finish it, or what do you suggest? @yunmaoQu

@yunmaoQu
Copy link
Author

yunmaoQu commented Jan 4, 2025

Yes. We can work it together.@walkinggo

@yunmaoQu
Copy link
Author

yunmaoQu commented Jan 5, 2025

@lhotari

OK.Should i implement it based on the pre commit or what?

That's something you can decide. Please read my previous message and draw your conclusions.

I implement a minimal version. Could you review it and give me some suggestion. Thanks for your previous guide.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very close to the minimal implementation. I added a few minor comments.

@lhotari lhotari added this to the 4.1.0 milestone Jan 6, 2025
@lhotari
Copy link
Member

lhotari commented Jan 6, 2025

Please also update the PR description to match the minimal implementation.

@yunmaoQu
Copy link
Author

yunmaoQu commented Jan 7, 2025

Please also update the PR description to match the minimal implementation.

@lhotari I have done this.

@lhotari
Copy link
Member

lhotari commented Jan 7, 2025

Please also update the PR description to match the minimal implementation.

@lhotari I have done this.

@yunmaoQu I don't see that the PR description has been updated to match the minimal implementation. For example, the "modifications" part hasn't been updated.

  • Add SchemaCache implementation using WeakHashMap for schema instance caching
  • Add cache configuration and metrics for monitoring
  • Add cleanup strategy for expired cache entries
  • Modify Schema creation methods (AVRO/JSON/PROTOBUF) to use cache
  • Add cloning mechanism to maintain schema immutability

@lhotari
Copy link
Member

lhotari commented Jan 7, 2025

checkstyle error:

[INFO] There is 1 error reported by Checkstyle 10.14.2 with /home/runner/work/pulsar/pulsar/buildtools/src/main/resources/pulsar/checkstyle.xml ruleset.
Error: src/main/java/org/apache/pulsar/client/internal/PulsarClientImplementationBinding.java:[130] (regexp) RegexpSingleline: Trailing whitespace

@lhotari
Copy link
Member

lhotari commented Jan 7, 2025

@yunmaoQu I'd recommend to run CI builds in your fork so that you get CI feedback while working on the changes. Some of that is explained in the contribution guide, https://pulsar.apache.org/contribute/personal-ci/ . You will also need to enable GitHub Actions in your apache/pulsar fork repository in GitHub UI.

@yunmaoQu
Copy link
Author

yunmaoQu commented Jan 7, 2025

@yunmaoQu I'd recommend to run CI builds in your fork so that you get CI feedback while working on the changes. Some of that is explained in the contribution guide, https://pulsar.apache.org/contribute/personal-ci/ . You will also need to enable GitHub Actions in your apache/pulsar fork repository in GitHub UI.

Thanks a lot.

- add a weak reference cache for caching a scheme instance for Schema.AVRO, Schema.JSON, Schema.PROTOBUF.
@yunmaoQu yunmaoQu force-pushed the feature/schema-cache branch from 941beb6 to fd24bd1 Compare January 7, 2025 11:06
@yunmaoQu
Copy link
Author

yunmaoQu commented Jan 7, 2025

@lhotari I've modified the error according to the CI's prompt. But when i run personal CI ,it still reports an error.

@lhotari
Copy link
Member

lhotari commented Jan 7, 2025

@lhotari I've modified the error according to the CI's prompt, but it still reports an error.

CI isn't the only choice. You can also reproduce the errors locally.

For CI feedback, I'd recommend creating a PR in your own fork so that the PR appears at https://github.com/yunmaoQu/pulsar/pulls . When you push changes to the branch, the CI will trigger and you won't have to depend on CI feedback from apache/pulsar CI runs which will only run after someone approves the workflow run.

@lhotari
Copy link
Member

lhotari commented Jan 7, 2025

For locally running a sanity check, you can use this command:

mvn -Pcore-modules,-main -T 1C clean install -DskipTests -Dspotbugs.skip=true -DnarPluginPhase=none

@lhotari
Copy link
Member

lhotari commented Jan 7, 2025

There are multiple checkstyle errors:

[ERROR] src/main/java/org/apache/pulsar/client/impl/PulsarClientImplementationBindingImpl.java:[33,1] (imports) ImportOrder: Import java.util.concurrent.ConcurrentMap appears after other imports that it should precede
[ERROR] src/main/java/org/apache/pulsar/client/impl/PulsarClientImplementationBindingImpl.java:[88,1] (imports) ImportOrder: Import com.google.common.collect.MapMaker appears after other imports that it should precede
[ERROR] src/main/java/org/apache/pulsar/client/impl/PulsarClientImplementationBindingImpl.java:[222] (regexp) RegexpSingleline: Trailing whitespace
[ERROR] src/main/java/org/apache/pulsar/client/impl/PulsarClientImplementationBindingImpl.java:[242] (regexp) RegexpSingleline: Trailing whitespace
[ERROR] src/main/java/org/apache/pulsar/client/impl/PulsarClientImplementationBindingImpl.java:[251] (regexp) RegexpSingleline: Trailing whitespace

I'd recommend following the contribution guide to properly configure IntelliJ/IDEA for Pulsar development.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unnecessary cleanup strategy stuff is back. We don't need those type of features.

@yunmaoQu
Copy link
Author

yunmaoQu commented Jan 7, 2025

The unnecessary cleanup strategy stuff is back. We don't need those type of features.

Very Sorry. A little git operation error.

@yunmaoQu yunmaoQu force-pushed the feature/schema-cache branch from 9bfbe8a to 1cf7b61 Compare January 7, 2025 17:19
@yunmaoQu
Copy link
Author

yunmaoQu commented Jan 8, 2025

@lhotari I test CI in my personal repo,the part i change is ok, you can see https://github.com/yunmaoQu/pulsar/pulls

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are now fine for the minimal implementation. I did some manual checks and noticed that adding this cache won't resolve the performance issue. For Avro schema, the problem is that the .clone() method doesn't by-pass the already parsed Avro schema instance to the new cloned instance. Since it might be hard to detect such issues, it would be necessary to have a micro benchmark which could be used to detect the issues. In Pulsar, we have the module microbench where such a benchmark could be added. The module has JMH configured. In JMH, it's also possible to enable profiling using AsyncProfiler or Java Flight Recorder to find the performance hotspots. I won't be able to guide through all steps required to handle this. It would be necessary to add a micro benchmark and then address the performance issues that show up.

@yunmaoQu
Copy link
Author

yunmaoQu commented Jan 8, 2025

@lhotari This pr seems make no sense .Should i close it and focus on other issue?

@lhotari
Copy link
Member

lhotari commented Jan 8, 2025

@lhotari This pr seems make no sense .Should i close it and focus on other issue?

@yunmaoQu You can choose to do that. I'm not your boss. :) In many cases, finding a solution isn't a direct path. That's also the case here.

@yunmaoQu yunmaoQu closed this Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs ready-to-test release/4.0.3
Projects
None yet
3 participants