From fc6029f40b6043acc056b4629a2dbd66af3663e6 Mon Sep 17 00:00:00 2001 From: Di Qi Date: Tue, 14 Nov 2023 22:36:02 -0800 Subject: [PATCH] Update images in README, spacing edits, add link (#225) --- .gitignore | 1 + README.md | 35 ++++++++++++++++++++++------------- 2 files changed, 23 insertions(+), 13 deletions(-) diff --git a/.gitignore b/.gitignore index 2f2708997..348ed5c8b 100644 --- a/.gitignore +++ b/.gitignore @@ -21,3 +21,4 @@ sanitizer .vscode/ .devcontainer/ .cache +.cosine \ No newline at end of file diff --git a/README.md b/README.md index ced0e4087..bee8eb6a7 100644 --- a/README.md +++ b/README.md @@ -14,11 +14,13 @@ Lantern builds and uses [usearch](https://github.com/unum-cloud/usearch), a sing ## 🔧 Quick Install If you don’t have PostgreSQL already, use Lantern with [Docker](https://hub.docker.com/r/lanterndata/lantern) to get started quickly: + ```bash docker run -p 5432:5432 -e 'POSTGRES_PASSWORD=postgres' lanterndata/lantern:latest-pg15 ``` To install Lantern from source on top of PostgreSQL: + ``` git clone --recursive https://github.com/lanterndata/lantern.git cd lantern @@ -29,6 +31,7 @@ make install ``` To install Lantern using `homebrew`: + ``` brew tap lanterndata/lantern brew install lantern && lantern_install @@ -38,7 +41,7 @@ You can also install Lantern on top of PostgreSQL from our [precompiled binaries Alternatively, you can use Lantern in one click using [Replit](https://replit.com/@lanterndata/lantern-playground#.replit). -## 📖 How to use Lantern +## 📖 How to use Lantern Lantern retains the standard PostgreSQL interface, so it is compatible with all of your favorite tools in the PostgreSQL ecosystem. @@ -62,12 +65,14 @@ CREATE INDEX ON small_world USING hnsw (vector); ``` Customize `hnsw` index parameters depending on your vector data, such as the distance function (e.g., `dist_l2sq_ops`), index construction parameters, and index search parameters. + ```sql CREATE INDEX ON small_world USING hnsw (vector dist_l2sq_ops) WITH (M=2, ef_construction=10, ef=4, dim=3); ``` Start querying data + ```sql SET enable_seqscan = false; SELECT id, l2sq_dist(vector, ARRAY[0,0,0]) AS dist @@ -89,47 +94,51 @@ There are four defined operator classes that can be employed during index creati ### Index Construction Parameters -The `M`, `ef`, and `ef_construction` parameters control the performance of the HNSW algorithm for your use case. -- In general, lower `M` and `ef_construction` speed up index creation at the cost of recall. +The `M`, `ef`, and `ef_construction` parameters control the performance of the HNSW algorithm for your use case. + +- In general, lower `M` and `ef_construction` speed up index creation at the cost of recall. - Lower `M` and `ef` improve search speed and result in fewer shared buffer hits at the cost of recall. Tuning these parameters will require experimentation for your specific use case. ### Miscellaneous + - If you have previously cloned Lantern and would like to update run `git pull && git submodule update` -## ⭐️ Features +## ⭐️ Features + - Embedding generation for popular use cases (CLIP model, Hugging Face models, custom model) -- Interoperability with pgvector's data type, so anyone using pgvector can switch to Lantern +- Interoperability with pgvector's data type, so anyone using pgvector can switch to Lantern - Parallel index creation via an external indexer - Ability to generate the index graph outside of the database server - Support for creating the index outside of the database and inside another instance allows you to create an index without interrupting database workflows. -- See all of our helper functions to better enable your workflows +- See all of our helper functions to better enable your workflows ## 🏎️ Performance Important takeaways: + - There's three key metrics we track. `CREATE INDEX` time, `SELECT` throughput, and `SELECT` latency. - We match or outperform pgvector and pg_embedding (Neon) on all of these metrics. - We plan to continue to make performance improvements to ensure we are the best performing database.

-Lantern throughput -Lantern latency -Lantern index creation +Lantern throughput +Lantern latency +Lantern index creation

## 🗺️ Roadmap -- Cloud-hosted version of Lantern - [Sign up](https://forms.gle/YwxTzN9138LZEeCw8) for updates +- Cloud-hosted version of Lantern - Sign up [here](https://lantern.dev) - Hardware-accelerated distance metrics, tailored for your CPU, enabling faster queries - Templates and guides for building applications for different industries -- More tools for generating embeddings (support for third party model API’s, more local models) +- More tools for generating embeddings (support for third party model API’s, more local models) - Support for version control and A/B test embeddings -- Autotuned index type that will choose appropriate creation parameters +- Autotuned index type that will choose appropriate creation parameters - Support for 1 byte and 2 byte vector elements, and up to 8000 dimensional vectors ([PR #19](https://github.com/lanterndata/lantern/pull/19)) - Request a feature at [support@lantern.dev](mailto:support@lantern.dev) ## 📚 Resources - [GitHub issues](https://github.com/lanterndata/lantern/issues): report bugs or issues with Lantern -- Need support? Contact [support@lantern.dev](mailto:support@lantern.dev). We are happy to troubleshoot issues and advise on how to use Lantern for your use case +- Need support? Contact [support@lantern.dev](mailto:support@lantern.dev). We are happy to troubleshoot issues and advise on how to use Lantern for your use case - We welcome community contributions! Feel free to open an issue or a PR. If you contact [support@lantern.dev](mailto:support@lantern.dev), we can find an open issue or project that fits you