LogoLogo
TwitterWebsite
  • Getting Started
    • Introduction
    • Human UI
    • Examples
    • Monitoring
    • Workflows
    • Getting Help
  • Documentation
    • Humans
      • Introduction
      • Prompts
      • Tools
      • Knowledge
      • Memory
      • Storage
      • Structured Output
      • Reasoning
      • Teams
    • Models
      • Introduction
      • Open AI
      • Open AI Like
      • Anthropic Claude
      • AWS Bedrock Claude
      • Azure
      • Cohere
      • DeepSeek
      • Fireworks
      • Gemini
      • Gemini - VertexAI
      • Groq
      • HuggingFace
      • Mistral
      • Nvidia
      • Ollama
      • OpenRouter
      • Sambanova
      • Together
      • xAI
    • Tools
      • Introduction
      • Functions
      • Writing your own Toolkit
      • Airflow
      • Apify
      • Arxiv
      • AWS Lambda
      • BaiduSearch
      • Calculator
      • Cal.com
      • Composio
      • Crawl4AI
      • CSV
      • Dalle
      • DuckDb
      • DuckDuckGo
      • Email
      • Exa
      • Fal
      • File
      • Firecrawl
      • Giphy
      • Github
      • Google Calendar
      • Google Search
      • Hacker News
      • Jina Reader
      • Jira
      • Linear
      • Lumalabs
      • MLX Transcribe
      • ModelsLabs
      • Newspaper
      • Newspaper4k
      • OpenBB
      • Bitca
      • Postgres
      • Pubmed
      • Pyton
      • Replicate
      • Resend
      • Searxng
      • Serpapi
      • Shell
      • Slack
      • Sleep
      • Spider
      • SQL
      • Tavily
      • Twitter
      • Website
      • Yfinance
      • Zendesk
    • Knowledges
      • Introduction
      • ArXiv Knowledge Base
      • Combined KnowledgeBase
      • CSV Knowledge Base
      • CSV URL Knowledge Base
      • Docx Knowledge Base
      • Document Knowledge Base
      • JSON Knowledge Base
      • LangChain Knowledge Base
      • LlamaIndex Knowledge Base
      • PDF Knowledge Base
      • PDF URL Knowledge Base
      • S3 PDF Knowledge Base
      • S3 Text Knowledge Base
      • Text Knowledge Base
      • Website Knowledge Base
    • Chunking
      • Fixed Size Chunking
      • Agentic Chunking
      • Semantic Chunking
      • Recursive Chunking
      • Document Chunking
    • VectorDBS
      • Introduction
      • PgVector Agent Knowledge
      • Qdrant Agent Knowledge
      • Pinecone Agent Knowledge
      • LanceDB Agent Knowledge
      • ChromaDB Agent Knowledge
      • SingleStore Agent Knowledge
    • Storage
      • Introduction
      • Postgres Agent Storage
      • Sqlite Agent Storage
      • Singlestore Agent Storage
      • DynamoDB Agent Storage
      • JSON Agent Storage
      • YAML Agent Storage
    • Embeddings
      • Introduction
      • OpenAI Embedder
      • Gemini Embedder
      • Ollama Embedder
      • Voyage AI Embedder
      • Azure OpenAI Embedder
      • Mistral Embedder
      • Fireworks Embedder
      • Together Embedder
      • HuggingFace Embedder
      • Qdrant FastEmbed Embedder
      • SentenceTransformers Embedder
    • Workflows
      • Introduction
      • Session State
      • Streaming
      • Advanced Example - News Report Generator
  • How To
    • Install & Upgrade
    • Upgrade to v2.5.0
Powered by GitBook
LogoLogo

© 2025 Bitca. All rights reserved.

On this page
  • ​Usage
  • ​Params
Export as PDF
  1. Documentation
  2. Knowledges

PDF Knowledge Base

PreviousLlamaIndex Knowledge BaseNextPDF URL Knowledge Base

Last updated 4 months ago

The PDFKnowledgeBase reads local PDF files, converts them into vector embeddings and loads them to a vector databse.

Usage

We are using a local PgVector database for this example.

pip install pypdf

knowledge_base.py

from bitca.knowledge.pdf import PDFKnowledgeBase, PDFReader
from bitca.vectordb.pgvector import PgVector

pdf_knowledge_base = PDFKnowledgeBase(
    path="data/pdfs",
    # Table name: ai.pdf_documents
    vector_db=PgVector(
        table_name="pdf_documents",
        db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
    ),
    reader=PDFReader(chunk=True),
)

Then use the knowledge_base with an Agent:

agent.py

from bitca.agent import Agent
from knowledge_base import knowledge_base

agent = Agent(
    knowledge=knowledge_base,
    search_knowledge=True,
)
agent.knowledge.load(recreate=False)

agent.print_response("Ask me about something from the knowledge base")
Parameter
Type
Default
Description

path

Union[str, Path]

-

Path to PDF files. Can point to a single PDF file or a directory of PDF files.

vector_db

VectorDb

-

Vector Database for the Knowledge Base. Example: PgVector

reader

Union[PDFReader, PDFImageReader]

PDFReader()

A PDFReader that converts the PDFs into Documents for the vector database.

num_documents

int

5

Number of documents to return on search.

optimize_on

int

-

Number of documents to optimize the vector db on. For Example: Create an index for PgVector.

chunking_strategy

ChunkingStrategy

FixedSizeChunking

The chunking strategy to use.

Params

​
Make sure it’s running
​