FastEmbed from Qdrant is a lightweight, fast, Python library built for embedding generation.
- Quantized model weights
- ONNX Runtime, no PyTorch dependency
- CPU-first design
- Data-parallelism for encoding of large datasets.
fastembed
Python package.
model_name: str
(default: “BAAI/bge-small-en-v1.5”)
Name of the FastEmbedding model to use. You can find the list of supported models here.
max_length: int
(default: 512)
The maximum number of tokens. Unknown behavior for values > 512.
cache_dir: Optional[str]
(default: None)
The path to the cache directory. Defaults to local_cache
in the parent directory.
threads: Optional[int]
(default: None)
The number of threads a single onnxruntime session can use.
doc_embed_type: Literal["default", "passage"]
(default: “default”)
“default”: Uses FastEmbed’s default embedding method.
“passage”: Prefixes the text with “passage” before embedding.
batch_size: int
(default: 256)
Batch size for encoding. Higher values will use more memory, but be faster.
parallel: Optional[int]
(default: None)
If>1
, data-parallel encoding will be used, recommended for offline encoding of large datasets. If0
, use all available cores. IfNone
, don’t use data-parallel processing, use default onnxruntime threading instead.