LogoLogo
TwitterWebsite
  • Getting Started
    • Introduction
    • Human UI
    • Examples
    • Monitoring
    • Workflows
    • Getting Help
  • Documentation
    • Humans
      • Introduction
      • Prompts
      • Tools
      • Knowledge
      • Memory
      • Storage
      • Structured Output
      • Reasoning
      • Teams
    • Models
      • Introduction
      • Open AI
      • Open AI Like
      • Anthropic Claude
      • AWS Bedrock Claude
      • Azure
      • Cohere
      • DeepSeek
      • Fireworks
      • Gemini
      • Gemini - VertexAI
      • Groq
      • HuggingFace
      • Mistral
      • Nvidia
      • Ollama
      • OpenRouter
      • Sambanova
      • Together
      • xAI
    • Tools
      • Introduction
      • Functions
      • Writing your own Toolkit
      • Airflow
      • Apify
      • Arxiv
      • AWS Lambda
      • BaiduSearch
      • Calculator
      • Cal.com
      • Composio
      • Crawl4AI
      • CSV
      • Dalle
      • DuckDb
      • DuckDuckGo
      • Email
      • Exa
      • Fal
      • File
      • Firecrawl
      • Giphy
      • Github
      • Google Calendar
      • Google Search
      • Hacker News
      • Jina Reader
      • Jira
      • Linear
      • Lumalabs
      • MLX Transcribe
      • ModelsLabs
      • Newspaper
      • Newspaper4k
      • OpenBB
      • Bitca
      • Postgres
      • Pubmed
      • Pyton
      • Replicate
      • Resend
      • Searxng
      • Serpapi
      • Shell
      • Slack
      • Sleep
      • Spider
      • SQL
      • Tavily
      • Twitter
      • Website
      • Yfinance
      • Zendesk
    • Knowledges
      • Introduction
      • ArXiv Knowledge Base
      • Combined KnowledgeBase
      • CSV Knowledge Base
      • CSV URL Knowledge Base
      • Docx Knowledge Base
      • Document Knowledge Base
      • JSON Knowledge Base
      • LangChain Knowledge Base
      • LlamaIndex Knowledge Base
      • PDF Knowledge Base
      • PDF URL Knowledge Base
      • S3 PDF Knowledge Base
      • S3 Text Knowledge Base
      • Text Knowledge Base
      • Website Knowledge Base
    • Chunking
      • Fixed Size Chunking
      • Agentic Chunking
      • Semantic Chunking
      • Recursive Chunking
      • Document Chunking
    • VectorDBS
      • Introduction
      • PgVector Agent Knowledge
      • Qdrant Agent Knowledge
      • Pinecone Agent Knowledge
      • LanceDB Agent Knowledge
      • ChromaDB Agent Knowledge
      • SingleStore Agent Knowledge
    • Storage
      • Introduction
      • Postgres Agent Storage
      • Sqlite Agent Storage
      • Singlestore Agent Storage
      • DynamoDB Agent Storage
      • JSON Agent Storage
      • YAML Agent Storage
    • Embeddings
      • Introduction
      • OpenAI Embedder
      • Gemini Embedder
      • Ollama Embedder
      • Voyage AI Embedder
      • Azure OpenAI Embedder
      • Mistral Embedder
      • Fireworks Embedder
      • Together Embedder
      • HuggingFace Embedder
      • Qdrant FastEmbed Embedder
      • SentenceTransformers Embedder
    • Workflows
      • Introduction
      • Session State
      • Streaming
      • Advanced Example - News Report Generator
  • How To
    • Install & Upgrade
    • Upgrade to v2.5.0
Powered by GitBook
LogoLogo

© 2025 Bitca. All rights reserved.

On this page
  • ​Prerequisites
  • ​Example
  • ​Toolkit Params
  • ​Toolkit Functions
Export as PDF
  1. Documentation
  2. Tools

MLX Transcribe

PreviousLumalabsNextModelsLabs

Last updated 4 months ago

MLX Transcribe is a tool for transcribing audio files using MLX Whisper.

Prerequisites

  1. Install ffmpeg

    • macOS: brew install ffmpeg

    • Ubuntu: sudo apt-get install ffmpeg

    • Windows: Download from

  2. Install mlx-whisper library

    pip install mlx-whisper
  3. Prepare audio files

    • Create a ‘storage/audio’ directory

    • Place your audio files in this directory

    • Supported formats: mp3, mp4, wav, etc.

  4. Download sample audio (optional)

    • Visit:

    • Save the audio file to ‘storage/audio’ directory

Example

The following agent will use MLX Transcribe to transcribe audio files.

cookbook/tools/mlx_transcribe_tools.py


from pathlib import Path
from bitca.agent import Agent
from bitca.model.openai import OpenAIChat
from bitca.tools.mlx_transcribe import MLXTranscribe

# Get audio files from storage/audio directory
bitca_root_dir = Path(__file__).parent.parent.parent.resolve()
audio_storage_dir = bitca_root_dir.joinpath("storage/audio")
if not audio_storage_dir.exists():
    audio_storage_dir.mkdir(exist_ok=True, parents=True)

agent = Agent(
    name="Transcription Agent",
    model=OpenAIChat(id="gpt-4o"),
    tools=[MLXTranscribe(base_dir=audio_storage_dir)],
    instructions=[
        "To transcribe an audio file, use the `transcribe` tool with the name of the audio file as the argument.",
        "You can find all available audio files using the `read_files` tool.",
    ],
    markdown=True,
)

agent.print_response("Summarize the reid hoffman ted talk, split into sections", stream=True)
Parameter
Type
Default
Description

base_dir

Path

Path.cwd()

Base directory for audio files

read_files_in_base_dir

bool

True

Whether to register the read_files function

path_or_hf_repo

str

"mlx-community/whisper-large-v3-turbo"

Path or HuggingFace repo for the model

verbose

bool

None

Enable verbose output

temperature

float or Tuple[float, ...]

None

Temperature for sampling

compression_ratio_threshold

float

None

Compression ratio threshold

logprob_threshold

float

None

Log probability threshold

no_speech_threshold

float

None

No speech threshold

condition_on_previous_text

bool

None

Whether to condition on previous text

initial_prompt

str

None

Initial prompt for transcription

word_timestamps

bool

None

Enable word-level timestamps

prepend_punctuations

str

None

Punctuations to prepend

append_punctuations

str

None

Punctuations to append

clip_timestamps

str or List[float]

None

Clip timestamps

hallucination_silence_threshold

float

None

Hallucination silence threshold

decode_options

dict

None

Additional decoding options

Function
Description

transcribe

Transcribes an audio file using MLX Whisper

read_files

Lists all audio files in the base directory

Toolkit Params

Toolkit Functions

​
https://ffmpeg.org/download.html
https://www.ted.com/talks/reid_hoffman_and_kevin_scott_the_evolution_of_ai_and_how_it_will_impact_human_creativity
​
​
​