5 मार्च 2026

घर पर मुफ्त में एजेंट बनाएं: ओपन सोर्स टूल्स और मॉडल्स

आप केवल मुफ्त टूल्स और ओपन सोर्स मॉडल्स का उपयोग करके अपने लैपटॉप पर एक कार्यशील मल्टी-एजेंट सिस्टम बना सकते हैं। कोई क्रेडिट कार्ड नहीं, कोई API रेट लिमिट नहीं, कोई वेंडर लॉक-इन नहीं। समझौता? धीमी इनफेरेंस और थोड़ी कम क्षमता। लेकिन सीखने, प्रयोग करने और छोटे पैमाने की ऑटोमेशन के लिए, यह पर्याप्त से अधिक है।

इस गाइड के अंत तक, आपके पास अपने हार्डवेयर पर चलने वाला एक लोकल रिसर्च एजेंट होगा — वेब खोजना, फाइलें पढ़ना, और जानकारी संश्लेषित करना — पूरी तरह मुफ्त।

ओपन सोर्स एजेंट डेवलपमेंट का मामला

लागत की वास्तविकता

Claude API की कीमत प्रोडक्शन वर्कलोड के लिए उचित है, लेकिन सीखना और प्रयोग करना जल्दी जमा हो जाता है। लोकल इनफेरेंस इस चिंता को पूरी तरह समाप्त करती है — शून्य सीमांत लागत पर असीमित पुनरावृत्ति।

गोपनीयता और डेटा नियंत्रण

कुछ वर्कलोड क्लाउड APIs पर नहीं जा सकते। मरीज डेटा, मालिकाना कोड, संवेदनशील व्यावसायिक तर्क — सभी को आपकी मशीन पर रहने से लाभ होता है। लोकल मॉडल्स बिना किसी डेटा को हार्डवेयर छोड़े वही बुद्धिमान व्यवहार प्रदान करते हैं।

वास्तविकता जांच

7-13B पैरामीटर के ओपन सोर्स मॉडल जटिल रीजनिंग, कोड जनरेशन और निर्देश पालन में फ्रंटियर मॉडल्स से काफी पीछे हैं। जहाँ सटीकता मायने रखती है उन प्रोडक्शन के लिए, वाणिज्यिक APIs सबसे अच्छा विकल्प हैं। सीखने और प्रोटोटाइपिंग के लिए, लोकल मॉडल उत्कृष्ट हैं।

अपना ओपन सोर्स LLM चुनना

LLM परिवार

Llama (Meta): सबसे व्यापक रूप से समर्थित परिवार। Llama 3 विशेष रूप से सक्षम है।
Mistral: उत्कृष्ट गुणवत्ता/पैरामीटर अनुपात। Mistral-7B अपने आकार से बेहतर प्रदर्शन करता है।
Phi (Microsoft): अत्यंत कुशल छोटे मॉडल। Phi-3 Mini बिना GPU के चलता है।
Qwen (Alibaba): मजबूत बहुभाषी प्रदर्शन।

मॉडल आकार और हार्डवेयर आवश्यकताएं

मॉडल	पैरामीटर	क्वांटाइजेशन	VRAM	गति	गुणवत्ता
Phi-3 Mini	3.8B	4-bit	4GB	बहुत तेज	अच्छी
Mistral-7B	7B	4-bit	8GB	तेज	अच्छी
Llama-3-8B	8B	4-bit	8GB	तेज	बहुत अच्छी
Llama-3-13B	13B	4-bit	10GB	मध्यम	बहुत अच्छी
Mixtral-8x7B	46.7B (MoE)	4-bit	24GB	धीमी	उत्कृष्ट

क्वांटाइजेशन: यह क्यों मायने रखता है

क्वांटाइजेशन कम मेमोरी उपयोग के लिए मॉडल वेट्स को संपीड़ित करता है। पूर्ण परिशुद्धता में 7B पैरामीटर मॉडल के लिए ~28GB RAM चाहिए। 4-bit क्वांटाइजेशन में वही मॉडल केवल ~4GB में, गुणवत्ता की थोड़ी हानि के साथ।

हार्डवेयर के अनुसार शुरुआती बिंदु:

लैपटॉप (8-16GB RAM): Phi-3 Mini या 4-bit Mistral-7B
मध्यम GPU वाला डेस्कटॉप: GPU त्वरण के साथ Mistral-7B या Llama-3-8B
हाई-एंड GPU वाला डेस्कटॉप: 4-bit Mixtral-8x7B या Llama-3-70B

लोकल इनफेरेंस सेटअप

Ollama: सबसे आसान प्रवेश बिंदु

# macOS पर इंस्टॉल करें
brew install ollama

# Linux पर इंस्टॉल करें
curl -fsSL https://ollama.com/install.sh | sh

# Mistral-7B डाउनलोड करें
ollama pull mistral

# धीमे हार्डवेयर के लिए हल्का विकल्प
ollama pull phi3

# इनफेरेंस सर्वर शुरू करें
ollama serve

काम कर रहा है सत्यापित करें:

curl http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt": "एक वाक्य में AI एजेंट क्या है?",
  "stream": false
}'

LM Studio: GUI विकल्प

LM Studio एक डेस्कटॉप एप्लिकेशन है जो मॉडल डाउनलोड और चलाने के लिए ग्राफिकल इंटरफेस प्रदान करता है। यह OpenAI-संगत API एंडपॉइंट एक्सपोज करता है।

ओपन सोर्स एजेंट फ्रेमवर्क्स

फ्रेमवर्क	सबसे अच्छा	सीखने की अवस्था
LangChain	व्यापक टूल समर्थन	मध्यम
LangGraph	स्टेट मशीन वर्कफ्लो	मध्यम-उच्च
AutoGen	मल्टी-एजेंट बातचीत	कम
CrewAI	भूमिका-आधारित मल्टी-एजेंट	कम-मध्यम

AutoGen या LangChain से शुरू करें। AutoGen प्रोटोटाइप के लिए तेज है; LangChain एजेंट लूप पर अधिक नियंत्रण देता है।

अपना पहला एजेंट बनाना: लोकल रिसर्च असिस्टेंट

एनवायरनमेंट सेटअप

python3 -m venv agent-env
source agent-env/bin/activate
pip install langchain langchain-ollama langchain-community duckduckgo-search

पूरा एजेंट

#!/usr/bin/env python3
"""
Ollama + LangChain का उपयोग करके लोकल रिसर्च एजेंट
पूर्वापेक्षाएं:
  - ollama serve (बैकग्राउंड में चल रहा हो)
  - ollama pull mistral
  - pip install langchain langchain-ollama langchain-community duckduckgo-search
"""

import json
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import ChatOllama
from langchain.agents import AgentExecutor, create_react_agent
from duckduckgo_search import DDGS


@tool
def search_web(query: str) -> str:
    """DuckDuckGo का उपयोग करके वेब पर जानकारी खोजें।"""
    try:
        with DDGS() as ddgs:
            results = list(ddgs.text(query, max_results=5))
        if not results:
            return "कोई परिणाम नहीं मिला।"
        return json.dumps(
            [{"title": r["title"], "snippet": r["body"], "url": r["href"]}
             for r in results],
            indent=2, ensure_ascii=False
        )
    except Exception as e:
        return f"खोज विफल: {e}"


@tool
def read_file(filepath: str) -> str:
    """लोकल फाइल पढ़ें और उसकी सामग्री लौटाएं।"""
    try:
        with open(filepath, "r") as f:
            return f.read()
    except FileNotFoundError:
        return f"फाइल नहीं मिली: {filepath}"


@tool
def calculate(expression: str) -> str:
    """सुरक्षित रूप से एक साधारण अंकगणितीय अभिव्यक्ति का मूल्यांकन करें।"""
    try:
        result = eval(expression, {"__builtins__": {}}, {})
        return str(result)
    except Exception as e:
        return f"गणना त्रुटि: {e}"


llm = ChatOllama(
    model="mistral",
    base_url="http://localhost:11434",
    temperature=0.3,
)

prompt = ChatPromptTemplate.from_messages([
    ("system", """आप एक रिसर्च असिस्टेंट हैं। अपने टूल्स का उपयोग करके सवालों का जवाब दें।

उपलब्ध टूल्स: {tool_names}
टूल विवरण: {tools}

इस प्रारूप का बिल्कुल पालन करें:
Thought: मुझे क्या पता लगाना है?
Action: tool_name
Action Input: टूल के लिए इनपुट
Observation: टूल परिणाम
... (आवश्यकतानुसार दोहराएं)
Thought: अब मेरे पास पर्याप्त जानकारी है।
Final Answer: आपका पूरा जवाब"""),
    ("user", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

tools = [search_web, read_file, calculate]
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    max_iterations=8,
    handle_parsing_errors=True,
)

if __name__ == "__main__":
    result = agent_executor.invoke({
        "input": "अभी तीन सबसे लोकप्रिय ओपन सोर्स LLM कौन से हैं? नाम और प्रत्येक की एक मुख्य विशेषता दें।"
    })
    print(f"\nअंतिम उत्तर:\n{result['output']}")

चलाना

# टर्मिनल 1: Ollama शुरू करें
ollama serve

# टर्मिनल 2: एजेंट चलाएं
python research_agent.py

क्षमताएं बढ़ाना: API के बिना टूल्स

लोकल डेटाबेस एक्सेस

import sqlite3

@tool
def query_database(sql: str) -> str:
    """लोकल डेटाबेस पर रीड-ओनली SQL क्वेरी चलाएं।"""
    try:
        conn = sqlite3.connect("data.db")
        cursor = conn.cursor()
        cursor.execute(sql)
        rows = cursor.fetchall()
        columns = [desc[0] for desc in cursor.description]
        conn.close()
        return json.dumps([dict(zip(columns, row)) for row in rows], indent=2)
    except Exception as e:
        return f"डेटाबेस त्रुटि: {e}"

एम्बेडिंग के साथ लोकल डॉक्यूमेंट खोज

pip install sentence-transformers chromadb

from sentence_transformers import SentenceTransformer
import chromadb

model = SentenceTransformer("all-MiniLM-L6-v2")
client = chromadb.Client()
collection = client.create_collection("docs")

@tool
def search_docs(query: str) -> str:
    """सिमेंटिक समानता से लोकल डॉक्यूमेंट खोजें।"""
    results = collection.query(
        query_embeddings=[model.encode(query).tolist()],
        n_results=3
    )
    return json.dumps(results["documents"][0], indent=2)

वास्तविक दुनिया की सीमाएं और समाधान

विलंबता

प्रति इनफेरेंस चरण 2-30 सेकंड की अपेक्षा करें। रणनीतियां:

कम चरणों वाले एजेंट डिजाइन करें; सरल = तेज
जहाँ संभव हो टूल कॉल बैच करें

सटीकता अंतराल

शमन रणनीतियां:

छोटे, स्पष्ट प्रॉम्प्ट — अपेक्षित प्रारूप बिल्कुल निर्दिष्ट करें
जटिल कार्यों को चरणों में विभाजित करें
एजेंट को वापस देने से पहले टूल आउटपुट सत्यापित करें
टूल उपयोग के लिए 0.1-0.3 का temperature उपयोग करें

डिप्लॉयमेंट: लैपटॉप से हमेशा चालू

Docker

FROM python:3.12-slim
RUN pip install langchain langchain-ollama langchain-community duckduckgo-search
COPY research_agent.py .
ENV OLLAMA_HOST=host.docker.internal:11434
CMD ["python3", "research_agent.py"]

Systemd सेवा (Linux)

sudo systemctl enable ollama
sudo systemctl start ollama

आगे जाना: घर पर मल्टी-एजेंट सिस्टम

def run_pipeline(question: str) -> str:
    research_result = researcher.invoke({"input": question})
    final_output = writer.invoke({
        "input": f"इस रिसर्च के आधार पर एक स्पष्ट सारांश लिखें:\n{research_result['output']}"
    })
    return final_output["output"]

ओपन सोर्स एजेंट डेवलपमेंट आज व्यावहारिक है। रास्ता है: इनफेरेंस के लिए Ollama, एजेंट लूप के लिए LangChain या AutoGen, खोज के लिए DuckDuckGo, और बाकी सब के लिए आपकी मशीन का फाइल सिस्टम।

सरल से शुरू करें — ऊपर का रिसर्च एजेंट। इसे काम कराएं। फिर एक दूसरा टूल जोड़ें। फिर एक दूसरा एजेंट।