[Home]
The graphrag-toolkit uses two separate stores: a GraphStore
and a VectorStore
. A VectorStore
acts as a container for a collection of VectorIndex
. When constructing or querying a graph, you must provide instances of both a graph store and vector store.
The toolkit provides graph store implementations for both Amazon Neptune Analytics and Amazon Neptune Database, and vector store implementations for Neptune Analytics and Amazon OpenSearch Serverless. The graphrag-toolkit provides several convenient factory methods for creating instances of these stores. These factory methods accept formatted store identifiers, described below.
This early release of the toolkit provides support for Amazon Neptune and Amazon OpenSearch Serverless, but we welcome alternative store implementations. The store APIs and the ways in which the stores are used have been designed to anticipate alternative implementations. However, the proof is in the development: if you experience issues developing an alternative store, let us know.
Graph stores and vector stores provide connectivity to an existing storage instance, which you will need to have provisioned beforehand.
The code examples here are formatted to run in a Jupyter notebook. If you’re building an application with a main entry point, put your application logic inside a method, and add an if __name__ == '__main__'
block.
Graph stores must support the openCypher property graph query language. Graph construction queries typically use an UNWIND ... MERGE
idiom to create or update the graph for a batch of inputs. The Neptune graph store implementations override the GraphStore.node_id()
method to ensure that node ids in the code (e.g. chunkId
) are mapped to Neptune's ~id
reserved property. Alternative graph store implementations can leave the base implementation of node_id()
as-is. This will result in node ids being mapped to a property of the same name (i.e. a reference to chunkId
in the code will be mapped to a chunkId
property of a node).
You can use the GraphStoreFactory.for_graph_store()
static factory method to create an instance of a Neptune Analytics or Neptune Database graph store.
To create a Neptune Database graph store, supply a connection string that begins neptune-db://
, followed by an endpoint:
from graphrag_toolkit.storage import GraphStoreFactory
neptune_connection_info = 'neptune-db://mydbcluster.cluster-123456789012.us-east-1.neptune.amazonaws.com:8182'
graph_store = GraphStoreFactory.for_graph_store(neptune_connection_info)
To create a Neptune Analytics graph store, supply a connection string that begins neptune-graph://
, followed by the graph's identifier:
from graphrag_toolkit.storage import GraphStoreFactory
neptune_connection_info = 'neptune-graph://g-jbzzaqb209'
graph_store = GraphStoreFactory.for_graph_store(neptune_connection_info)
A vector store is a collection of vector indexes. The graphrag-toolkit uses up to two vector indexes: a chunk index and a statement index. The chunk index is typically much smaller than the statement index. If you want to use the SemanticGuidedRetriever, you will need to enable the statement index. If you want to use the TraversalBasedRetriever, you will need to enable the chunk index. If you want to use both retrievers, you will need to enable both indexes. (The VectorStoreFactory
described below enables both indexes by default.)
You can use the VectorStoreFactory.for_vector_store()
static factory method to create an instance of an Amazon OpenSearch Serverless or Neptune Database vector store.
To create an Amazon OpenSearch Serverless vector store, supply a connection string that begins aoss://
, followed the https endpoint of the OpenSearch Serverless collection:
from graphrag_toolkit.storage import VectorStoreFactory
opensearch_connection_info = 'aoss://https://123456789012.us-east-1.aoss.amazonaws.com'
vector_store = VectorStoreFactory.for_vector_store(opensearch_connection_info)
To create a Neptune Analytics vector store, supply a connection string that begins neptune-graph://
, followed by the graph's identifier:
from graphrag_toolkit.storage import VectorStoreFactory
neptune_connection_info = 'neptune-graph://g-jbzzaqb209'
vector_store = VectorStoreFactory.for_vector_store(neptune_connection_info)
By default, the VectorStoreFactory
will enable both the statement index and the chunk index. If you want to enable just one of the indexes, pass an index_names
argument to the factory method:
vector_store = VectorStoreFactory.for_vector_store(opensearch_connection_info, index_names=['chunk'])