Fixed Size Chunking

Fixed size chunking is a method of splitting documents into smaller chunks of a specified size, with optional overlap between chunks. This is useful when you want to process large documents in smaller, manageable pieces.

Usage

from bitca.agent import Agent
from bitca.document.chunking.fixed import FixedSizeChunking
from bitca.knowledge.pdf import PDFUrlKnowledgeBase
from bitca.vectordb.pgvector import PgVector

db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"

knowledge_base = PDFUrlKnowledgeBase(
    urls=["https://bitca-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf"],
    vector_db=PgVector(table_name="recipes_fixed_size_chunking", db_url=db_url),
    chunking_strategy=FixedSizeChunking(),
)
knowledge_base.load(recreate=False)  # Comment out after first run

agent = Agent(
    knowledge_base=knowledge_base,
    search_knowledge=True,
)

agent.print_response("How to make Thai curry?", markdown=True)

Params

Parameter
Type
Default
Description

chunk_size

int

5000

The maximum size of each chunk.

overlap

int

0

The number of characters to overlap between chunks.

Last updated