Recursive chunking is a method of splitting documents into smaller chunks by recursively applying a chunking strategy. This is useful when you want to process large documents in smaller, manageable pieces.
from bitca.agent import Agent
from bitca.document.chunking.recursive import RecursiveChunking
from bitca.knowledge.pdf import PDFUrlKnowledgeBase
from bitca.vectordb.pgvector import PgVector
db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"
knowledge_base = PDFUrlKnowledgeBase(
urls=["https://bitca-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf"],
vector_db=PgVector(table_name="recipes_recursive_chunking", db_url=db_url),
chunking_strategy=RecursiveChunking(),
)
knowledge_base.load(recreate=False) # Comment out after first run
agent = Agent(
knowledge_base=knowledge_base,
search_knowledge=True,
)
agent.print_response("How to make Thai curry?", markdown=True)
Parameter
Type
Default
Description
chunk_size
int
5000
The maximum size of each chunk.
overlap
int
0
The number of characters to overlap between chunks.