js-tiktoken
js-tiktoken is a JavaScript vesrion of the
BPE
tokenizer created by OpenAI
.tiktoken
to estimate tokens used using @[TokenTextSplitter]. It will probably be more accurate for OpenAI mdoels.
- How the text is split: by character passed in.
- How the chunk size is measured: by
tiktoken
tokenizer.
tiktoken
, pass in an encodingName
(e.g. cl100k_base) when initializing the @[TokenTextSplitter]. Note that splits from this method can be larger than the chunk size measured by the tiktoken
tokenizer.