Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Solr sink indexing is empty after PostgreSQL CDC changes #23763

Open
2 of 3 tasks
rajasekar-d opened this issue Dec 20, 2024 · 0 comments
Open
2 of 3 tasks

[Bug] Solr sink indexing is empty after PostgreSQL CDC changes #23763

rajasekar-d opened this issue Dec 20, 2024 · 0 comments
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@rajasekar-d
Copy link

rajasekar-d commented Dec 20, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

OS: Windows
Docker Image: apachepulsar/pulsar:4.0.1

Minimal reproduce step

Solr sink indexing is empty after PostgreSQL CDC changes

Postgresql Setup:

ALTER SYSTEM SET wal_level = 'logical';
CREATE TABLE IF NOT EXISTS shortlists (
  id SERIAL,
  user_id INTEGER,
  profile_id INTEGER,
  CONSTRAINT unique_shortlist UNIQUE(user_id, profile_id)
);
CREATE PUBLICATION pulsar_pub FOR TABLE shortlists;
ALTER TABLE public.shortlists REPLICA IDENTITY FULL;

Solr Schema

    <!-- Special Fields -->
    <field name="id" type="string" required="true" />
    <field name="_version_" type="plong" indexed="false" stored="false" />
    <field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>
  
    <!-- Column Details -->
    <field name="user_id" type="string"/>
    <field name="profile_id" type="string"/>

    <uniqueKey>id</uniqueKey>

Docker Image

FROM apachepulsar/pulsar:4.0.1
USER root
RUN mkdir connectors
RUN mkdir functions
COPY --chown=pulsar:pulsar ./connectors ./connectors
COPY --chown=pulsar:pulsar ./config ./conf
CMD ["bin/pulsar", "standalone"]
USER pulsar
EXPOSE 8080

Postgresql CDC Source:

tenant: "public"
namespace: "default"
name: "postgresql-cdc"
topicName: "postgresql-cdc-topic"
archive: "connectors/pulsar-io-debezium-postgres-4.0.1.nar"
parallelism: 1
configs:
  database.hostname: "host.docker.internal"
  database.port: "5432"
  database.user: "postgres"
  database.password: "Pass123"
  database.dbname: "example"
  database.server.name: "dbserver"
  plugin.name: "pgoutput"
  schema.whitelist: "public"
  table.whitelist: "public.shortlists"
  slot.name: "pulsar_slot"
  publication.name: "pulsar_pub"

Register Source

bin/pulsar-admin source create --source-config-file $PWD/conf/postgresql-cdc.yaml

Solr Sink:

tenant: "public"
namespace: "default"
name: "solr-sink-wishlists"
archive: "connectors/pulsar-io-solr-4.0.1.nar"
inputs: 
  - "persistent://public/default/dbserver.public.shortlists"
configs:
  solrUrl: "http://host.docker.internal:3000/solr"
  solrMode: "Standalone"
  solrCollection: "shortlists"
  solrCommitWithinMs: 100

Register Sink

bin/pulsar-admin sinks create --sink-config-file $PWD/conf/solr-sink-shortlists.yaml

Verify CDC
bin/pulsar-client consume -s "sub-shortlists" public/default/dbserver.public.shortlists-n 0

Output

publishTime:[1734679294113], eventTime:[0], key:[eyJpZCI6NjF9], properties:[], content:{"before":{"id":1,"user_id":4,"profile_id":6},"after":{"id":1,"user_id":4,"profile_id":6},"source":{"version":"1.9.7.Final","connector":"postgresql","name":"dbserver","ts_ms":1734679292608,"snapshot":"false","db":"example","sequence":"[\"207140192\",\"207140304\"]","schema":"public","table":"shortlists","txId":13346,"lsn":207140304,"xmin":null},"op":"u","ts_ms":1734679292632,"transaction":null}`

What did you expect to see?

When I execute solr query getting empty record

Request:

http://localhost:8983/solr/shortlists/query
{
    "query": "*:*"
}

Response:

{
    "responseHeader": {
        "status": 0,
        "QTime": 16,
        "params": {
            "json": "{\r\n    \"query\": \"*:*\"\r\n}"
        }
    },
    "response": {
        "numFound": 2,
        "start": 0,
        "numFoundExact": true,
        "docs": [
            {
                "id": "33f9c371-3f8a-4b0e-8baf-b00b88bed688",
                "_version_": 1818941629431545856
            },
            {
                "id": "ba275867-52cb-4939-b7c9-3a034826d8b7",
                "_version_": 1818941670188646400
            }
        ]
    }
}

What did you see instead?

Expected Result should be

{
    "responseHeader": {
        "status": 0,
        "QTime": 16,
        "params": {
            "json": "{\r\n    \"query\": \"*:*\"\r\n}"
        }
    },
    "response": {
        "numFound": 2,
        "start": 0,
        "numFoundExact": true,
        "docs": [
            {
                "id": 1,
                "user_id": 4,
                "profile_id": 6,
                "_version_": 1818941629431545856
            },
            {
                "id": 3,
                "user_id": 8,
                "profile_id": 9,
                "_version_": 1818941670188646400
            }
        ]
    }
}

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@rajasekar-d rajasekar-d added the type/bug The PR fixed a bug or issue reported a bug label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

1 participant