Embedding documents using optimized and quantized embedders integration
Integrate with the Embedding documents using optimized and quantized embedders embedding model using LangChain Python.
Embedding all documents using Quantized Embedders.The embedders are based on optimized models, created by using optimum-intel and IPEX.Example text is based on SBERT.
from langchain_community.embeddings import QuantizedBiEncoderEmbeddingsmodel_name = "Intel/bge-small-en-v1.5-rag-int8-static"encode_kwargs = {"normalize_embeddings": True} # set True to compute cosine similaritymodel = QuantizedBiEncoderEmbeddings( model_name=model_name, encode_kwargs=encode_kwargs, query_instruction="Represent this sentence for searching relevant passages: ",)
loading configuration file inc_config.json from cache atINCConfig { "distillation": {}, "neural_compressor_version": "2.4.1", "optimum_version": "1.16.2", "pruning": {}, "quantization": { "dataset_num_samples": 50, "is_static": true }, "save_onnx_model": false, "torch_version": "2.2.0", "transformers_version": "4.37.2"}Using `INCModel` to load a TorchScript model will be deprecated in v1.15.0, to load your model please use `IPEXModel` instead.
Let’s ask a question, and compare to 2 documents. The first contains the answer to the question, and the second one does not.We can check better suits our query.
question = "How many people live in Berlin?"
documents = [ "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.", "Berlin is well known for its museums.",]