Language models have a token limit. You should not exceed the token limit. When you split your text into chunks it is therefore a good idea to count the number of tokens. There are many tokenizers. When you count tokens in your text you should use the same tokenizer as used in the language model.Documentation Index
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
Use this file to discover all available pages before exploring further.
js-tiktoken
js-tiktoken is a JavaScript vesrion of the
BPE tokenizer created by OpenAI.tiktoken to estimate tokens used using TokenTextSplitter. It will probably be more accurate for OpenAI models.
- How the text is split: by character passed in.
- How the chunk size is measured: by
tiktokentokenizer.
tiktoken, pass in an encodingName (e.g. cl100k_base) when initializing the TokenTextSplitter. Note that splits from this method can be larger than the chunk size measured by the tiktoken tokenizer.
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

