Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: unique connection counting #9074

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

kwasniew
Copy link
Contributor

@kwasniew kwasniew commented Jan 9, 2025

About the changes

E2E POC for unique connection counting using windowed HyperLogLog. We want to know how many unique connections from SDKs we had in the previous hour and in the current hour.

Screenshot 2025-01-09 at 16 47 45

Algorithm:

  • on each request we emit the connection event when the connection id header is detected
  • we have unique connection counting service listening on unique connections and updating in-memory HyperLogLog (fixed memory algorithm and data structure for approximate uniqueness tracking)
  • we sync the in-memory HyperLogLog to postgres (BYTEA type) so that different pods can cross check uniqueness (merging in-memory with DB)
  • in a DB we store current window unique count as byte array HyperLogLog and previous window unique count as another byte array. We swap the buckets every hour accommodating for different corner cases with multiple pods
  • we start counting a new current window at every full hour (doing sliding window HyperLogLog is way more difficult than full hour window)

Followup:

  • split this PR into DB migration and the rest of it
  • add extensive tests for the unique counting service
  • add grafana metric to read the previous bucket every hour

Important files

Discussion points

Copy link

vercel bot commented Jan 9, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
unleash-monorepo-frontend ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 9, 2025 3:54pm
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
unleash-docs ⬜️ Ignored (Inspect) Visit Preview Jan 9, 2025 3:54pm

Copy link
Contributor

github-actions bot commented Jan 9, 2025

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ✅ 0 package(s) with unknown licenses.
  • ⚠️ 1 packages with OpenSSF Scorecard issues.
See the Details below.

OpenSSF Scorecard

PackageVersionScoreDetails
npm/hyperloglog-lite ^1.0.2 🟢 3
Details
CheckScoreReason
Token-Permissions⚠️ -1No tokens found
Dangerous-Workflow⚠️ -1no workflows found
Maintained⚠️ 00 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Code-Review⚠️ 0Found 0/14 approved changesets -- score normalized to 0
Binary-Artifacts🟢 10no binaries found in the repo
Packaging⚠️ -1packaging workflow not detected
SAST⚠️ 0no SAST tool detected
Pinned-Dependencies⚠️ -1no dependencies found
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Security-Policy⚠️ 0security policy file not detected
Fuzzing⚠️ 0project is not fuzzed
Vulnerabilities🟢 100 existing vulnerabilities detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
npm/hyperloglog-lite 1.0.2 🟢 3
Details
CheckScoreReason
Token-Permissions⚠️ -1No tokens found
Dangerous-Workflow⚠️ -1no workflows found
Maintained⚠️ 00 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Code-Review⚠️ 0Found 0/14 approved changesets -- score normalized to 0
Binary-Artifacts🟢 10no binaries found in the repo
Packaging⚠️ -1packaging workflow not detected
SAST⚠️ 0no SAST tool detected
Pinned-Dependencies⚠️ -1no dependencies found
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Security-Policy⚠️ 0security policy file not detected
Fuzzing⚠️ 0project is not fuzzed
Vulnerabilities🟢 100 existing vulnerabilities detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
npm/murmurhash32-node 1.0.1 ⚠️ 2.6
Details
CheckScoreReason
Code-Review⚠️ 0Found 0/15 approved changesets -- score normalized to 0
Token-Permissions⚠️ -1No tokens found
Binary-Artifacts🟢 10no binaries found in the repo
Maintained⚠️ 00 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Packaging⚠️ -1packaging workflow not detected
Dangerous-Workflow⚠️ -1no workflows found
Pinned-Dependencies⚠️ -1no dependencies found
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Security-Policy⚠️ 0security policy file not detected
Vulnerabilities🟢 100 existing vulnerabilities detected
License⚠️ 0license file not detected
Fuzzing⚠️ 0project is not fuzzed
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0

Scanned Files

  • package.json
  • yarn.lock

@@ -66,6 +66,11 @@ export function responseTimeMetrics(
req.query.appName;
}

const connectionId = req.headers['x-unleash-connection-id'];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where it starts. on each detected connection id received we let the interested parties know

this.activeHour = new Date().getHours();
}

listen() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this service is interested in new connection ids received


async count(connectionId: string) {
if (!this.flagResolver.isEnabled('uniqueSdkTracking')) return;
this.hll.add(HyperLogLog.hash(connectionId));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this data structure tracks unique connections

@@ -179,4 +180,10 @@ export const scheduleServices = async (
minutesToMilliseconds(15),
'cleanUpIntegrationEvents',
);

schedulerService.schedule(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync in-memory HyperLogLogs with DB

this.hll.add(HyperLogLog.hash(connectionId));
}

async sync(): Promise<void> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic will get detailed coverage. We track current and previous HyperLogLog in 2 DB rows. If we found the hour has changed we start a new current bucket and copy current to previous. We have to accommodate for multiple pods where any pod can be the first one to start a current bucket.

(
id VARCHAR(255) NOT NULL,
updated_at TIMESTAMP DEFAULT now(),
hll BYTEA NOT NULL,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hyperlog is a buffer that we translate to BYTEA type in postgres

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

Successfully merging this pull request may close these issues.

1 participant