thumbnail
Mohamed Amine Terbah

hermitAI v0.3: LLM + RAG + MCP = Real-time Personalized AI Twin

May 26, 2025
Cover image for hermitAI v0.3: LLM + RAG + MCP = Real-time Personalized AI Twin
Kai Chew
Kai Chew Subscriber

Posted on

hermitAI v0.3: LLM + RAG + MCP = Real-time Personalized AI Twin

This is a submission for the Bright Data AI Web Access Hackathon

What I Built

I built two complementary products that demonstrate the full potential of AI agents with real-time web access:

  1. HermitAI - A personal AI agent designed for autonomous research, real-time web interaction, and intelligent question-answering. It tackles the problem of information silos and the inherent limitations of Large Language Models (LLMs) that often operate on outdated knowledge. HermitAI aims to be your digital twin—an autonomous assistant that researches, scrapes the web, answers questions based on both private knowledge and live data, and is architected for future expansion.

  2. BrightData MCP for Roo Code - A specialized server that enables Roo Code to seamlessly search the web, navigate websites, take action, and retrieve data without getting blocked—perfect for scraping tasks. This integration brings the power of Bright Data's web access capabilities to the Roo ecosystem.

Core Problem Solved: Traditional LLMs lack access to real-time information and cannot easily integrate with personal knowledge bases or perform complex web interactions. My solutions bridge this gap by combining sophisticated Retrieval Augmented Generation (RAG) systems with dynamic web access capabilities provided by Bright Data's infrastructure. This allows AI agents to provide answers that are not only contextually relevant to a user's private data but also grounded in the most current information available on the web.

Think of HermitAI as ChatGPT on steroids—your personal AI sidekick that leverages the power of Gemini 2.5 Pro, the robust web access of Bright Data, and your own curated knowledge to achieve high-functioning productivity, even for a hermit!

Demo

1. HermitAI

hermitAI

hermitAI is like ChatGPT on steroids — your personal AI twin for autonomous research, real-time web scraping, intelligent Q&A and soon email, social, bill management & more. It’s designed to help hermits (and high-performers) live a focused, hands-off digital life. Built with Google’s Gemini 2.5 via Vertex AI, BrightData APIs, and Astro, hermitAI is your privacy-conscious AI agent — lightweight, powerful, and ready to grow.

What Is It?

hermitAI is a developer-friendly, self-hostable AI agent that combines:

  • LLM intelligence (Gemini 2.5 via Vertex AI),
  • Real-time web scraping (via BrightData),
  • Private knowledge retrieval (MongoDB vector db),
  • Modern UI (Astro, JSX),
  • and soon: Email, social, bill management & more.

It’s built for hackers, researchers, solopreneurs, and digital hermits seeking a streamlined, AI-augmented life.

Philosophy

hermitAI is for people who want to offload tedious digital tasks while maintaining sovereignty over their data and tools. It’s not just an AI assistant — it’s…

2. BrightData MCP for Roo Code

BrightData MCP for Roo Code

Enhance Roo Coding with Real-Time Web Data

🌟 Overview

Welcome to the official BrightData Model Context Protocol (MCP) server, designed to enhance Roo Code by enabling access, discovery, and extraction of real-time web data. This server allows Roo Code to seamlessly search the web, navigate websites, take action, and retrieve data—without getting blocked—perfect for scraping tasks.

RooCode + BrightData MCP

✨ Features

  • Real-time Web Access: Access up-to-date information directly from the web
  • Bypass Geo-restrictions: Access content regardless of location constraints
  • Web Unlocker: Navigate websites with bot detection protection
  • Browser Control: Optional remote browser automation capabilities
  • Seamless Integration: Designed for easy integration with Roo Code.

🚀 Quickstart with Roo Code

This guide explains how to integrate the BrightData MCP server with Roo Code, enabling powerful web access capabilities directly within your Roo environment.

Key to Success: Consistency in server naming and ensuring Roo Code's…

How I Used Bright Data's Infrastructure

My solutions are architected to deeply leverage Bright Data's capabilities through its Model Context Protocol (MCP) server integration, enabling AI agents with comprehensive web access across all four key actions: Discover, Access, Extract, and Interact.

1. Discover

  • When my AI agents need current information, they utilize the search_engine tool provided by the Bright Data MCP server to perform real-time searches across Google and other search engines.
  • This allows for dynamic discovery of relevant web pages, articles, and data sources pertinent to user queries.
  • In HermitAI, this discovery process feeds directly into the RAG system, while in Roo Code, it enables developers to build search-powered applications.

2. Access

  • Once relevant URLs are discovered, my tools employ capabilities like scrape_as_markdown via the Bright Data MCP to access content from web pages while bypassing common browsing complexities.
  • The Bright Data infrastructure handles proxy management, CAPTCHA solving, and other anti-bot measures automatically, ensuring reliable access to web content.
  • For Roo Code integration, this means developers can focus on building applications rather than managing web access infrastructure.

3. Extract

  • The scrape_as_markdown tool extracts core textual content in a clean, LLM-friendly format, which is crucial for AI understanding and synthesis.
  • HermitAI can extract structured data from various sources including news sites, social media, e-commerce platforms, and more.
  • The extracted data can be ingested into the RAG knowledge base for future reference or used immediately to answer user queries.

RAG

Tool

4. Interact

  • Both solutions leverage Bright Data's MCP architecture to support interactive browser automation tools.
  • HermitAI can navigate complex websites, fill forms, and perform other human-like interactions when needed.
  • The Roo Code integration enables developers to build applications that can programmatically interact with websites, opening up possibilities for automated workflows and data collection.

By using the Bright Data MCP server, my solutions gain a reliable, scalable, and versatile interface to the web, abstracting away the complexities of direct web scraping and interaction while providing powerful capabilities to AI agents and developers alike.

Performance Improvements

Access to reliable, real-time web data via Bright Data significantly enhances the performance and utility of my solutions compared to traditional AI systems:

1. Overcoming Knowledge Cut-offs

  • Problem: Standard LLMs have knowledge limited to their last training date, making them unable to answer questions about current events or real-time data.
  • Improvement with Bright Data: By using search_engine and scrape_as_markdown, my solutions can fetch and process live information, providing users with up-to-date answers and insights. This makes the AI vastly more useful for real-world, time-sensitive queries.

2. Enhanced RAG with Live Data

  • Problem: RAG systems are powerful for querying private data, but this data can become stale or lack broader context.
  • Improvement with Bright Data: HermitAI uses Bright Data to enrich its RAG system by discovering new information, extracting key details, and ingesting fresh data into its MongoDB Atlas vector store. This keeps the private knowledge base current and comprehensive.

3. Increased Accuracy and Reduced Hallucination

  • Problem: LLMs can sometimes "hallucinate" or provide plausible-sounding but incorrect information.
  • Improvement with Bright Data: By grounding responses in data retrieved directly from authoritative web sources, my solutions provide more accurate, verifiable answers with the ability to cite sources.

4. Foundation for Advanced Agentic Behavior

  • Problem: Creating truly autonomous agents that can perform complex multi-step tasks on the web is challenging due to website complexities and bot detection.
  • Improvement with Bright Data: The Bright Data infrastructure provides a robust foundation for building sophisticated agentic capabilities, allowing my solutions to navigate, interact with, and extract data from even the most challenging web environments.

5. Developer Productivity (Roo Code Integration)

  • Problem: Developers often struggle with implementing reliable web scraping and automation in their applications.
  • Improvement with Bright Data: The Roo Code integration abstracts away these complexities, allowing developers to focus on building features rather than managing web access infrastructure.

Real-World Use Cases

HermitAI demonstrates powerful real-world applications:

  1. Financial Research:
    • "What's happening with Bitcoin right now?" - HermitAI can fetch current prices, recent news, and social media sentiment
    • "Analyze this product on Amazon" - Extract product details, summarize reviews, and provide price analysis
  2. Professional Networking:
    • "Tell me about this LinkedIn profile" - Extract professional background, experience, and company information
    • "Research this company" - Gather information from company websites, social media, and business directories
  3. Content Analysis:
    • "Summarize this article" - Extract and condense key information from web content
    • "What are people saying about this Instagram post?" - Analyze comments and engagement
  4. Mindful Information Consumption:
    • During market volatility or breaking news, HermitAI provides factual updates while encouraging thoughtful reflection
    • Helps users distinguish between important information and emotional noise online

Conclusion

By combining the power of Bright Data's web access infrastructure with advanced AI capabilities, HermitAI and the Roo Code integration demonstrate the future of AI agents - tools that can autonomously navigate the web, gather real-time information, and provide valuable insights while respecting user agency and promoting thoughtful engagement with information.

These solutions transform AI from knowledgeable but potentially outdated assistants into dynamic, aware, and highly capable agents that can operate effectively with the real-time, ever-changing nature of the web - truly fulfilling the vision of the Bright Data AI Web Access Hackathon.

Top comments (4)

pic
Submit Preview
Collapse Expand
 
nevodavid profile image
Nevo David
Dropdown menu

insane work, honestly - you think long-term usefulness comes more from tech upgrades or just grinding out the boring daily stuff?

Like comment: Like comment: 2 Like Comment button
Collapse Expand
 
kafechew profile image
Kai Chew
Dropdown menu

Thanks! Appreciate that. Maybe long-term usefulness comes more from grinding out the boring daily stuff—getting real feedback, refining workflows, and improving UX bit by bit. Tech upgrades help, but they’re nothing without consistency and real-world use.

Like comment: Like comment: 1 Like Comment button
Collapse Expand
 
dotallio profile image
Dotallio
Dropdown menu

Super cool! Love the blend of RAG with real-time web actions - makes AI so much more useful. Are you planning to open up agent workflows for custom integrations?

Like comment: Like comment: 2 Like Comment button
Collapse Expand
 
kafechew profile image
Kai Chew
Dropdown menu

Thanks! Yes, definitely in the roadmap. I’m exploring ways to let users plug in their own APIs or actions via a simple UI—maybe something like “if this, then AI does that.” Goal is to keep it powerful but accessible, especially for non-devs.

Like comment: Like comment: 1 Like Comment button