MarkdownHeaderTextSplitter
strips headers being split on from the output chunk’s content. This can be disabled by setting strip_headers = False
.
MarkdownHeaderTextSplitter
strips white spaces and new lines. To preserve the original formatting of your Markdown documents, check out ExperimentalMarkdownSyntaxTextSplitter.MarkdownHeaderTextSplitter
aggregates lines based on the headers specified in headers_to_split_on
. We can disable this by specifying return_each_line
:
metadata
for each document.
RecursiveCharacterTextSplitter
, which allows for further control of the chunk size.