我正在尝试创建(最大)350 个字符长、100 个块重叠的块。
我知道这chunk_size
是一个上限,所以我可能会得到比这个更短的块。但为什么我没有得到任何chunk_overlap
?
是因为重叠也必须在分隔符之一上分割吗?那么如果separator
分割的 100 个字符以内可以分割,那么它就是 100 个字符 chunk_overlap 吗?
from langchain.text_splitter import RecursiveCharacterTextSplitter
some_text = """When writing documents, writers will use document structure to group content. \
This can convey to the reader, which idea's are related. For example, closely related ideas \
are in sentances. Similar ideas are in paragraphs. Paragraphs form a document. \n\n \
Paragraphs are often delimited with a carriage return or two carriage returns. \
Carriage returns …
Run Code Online (Sandbox Code Playgroud)