Building Agentic AI Solutions with DeepSeek-R1, CrewAI, and Cloud-Agnostic Deployment on Ubuntu

Building Agentic AI Solutions with DeepSeek-R1, CrewAI, and Cloud-Agnostic Deployment on Ubuntu

Mar 9, 2025·
Mohsen Davarynejad
Mohsen Davarynejad
· 9 min read
Created by AI with DALL·E

Introduction

The rise of agentic AI, where AI models operate autonomously to complete tasks, has opened new possibilities for businesses and developers. DeepSeek-R1, an advanced open-source reasoning model, combined with CrewAI, a framework for orchestrating AI agents, enables the creation of intelligent, multi-agent workflows. In this post,unlike cloud-dependent solutions, we’ll focus on deploying AI agents using CrewAI, a popular agentic framework and open source LLM models like DeepSeek-R1 in a cloud-agnostic setup on Ubuntu, whether on-premises or in a private cloud environment.

More specifically in this article, we will explore how to set up DeepSeek-R1 and CrewAI on Ubuntu using Ollama for efficient inference, enabling local or hybrid AI deployments while maintaining data privacy and flexibility.


Why Choose Cloud-Agnostic Deployment?

Deploying AI models in a cloud-agnostic manner means avoiding vendor lock-in while maintaining full control over data, compute resources, and scaling strategies. This approach offers several advantages:

  • Data Privacy & Security: Keep sensitive data in a controlled environment.
  • Lower Costs: Reduce dependency on expensive cloud GPU instances.
  • Customization & Flexibility: Optimize hardware usage according to specific needs.
  • Offline Functionality: Run AI applications without requiring continuous cloud access.
  • Scalability: Deploy across on-prem servers, private clouds, or hybrid environments.

Agentic Design vs. Traditional Software Development

Agentic systems represent a fundamentally different paradigm from traditional software, particularly in their ability to manage complex, evolving, and domain-specific challenges. While conventional software relies on predefined rules and structured data for automation, agentic systems, powered by large language models (LLMs), function autonomously, continuously learn from their environment, and make context-sensitive decisions. These capabilities stem from modular components such as reasoning, memory, cognitive skills, and tool integration, allowing them to execute sophisticated tasks and adapt dynamically to new situations.

Traditional software solutions excel in handling repetitive tasks and scaling operations horizontally but often lack the adaptability and domain-specific intelligence that agentic systems provide. Take manufacturing, for instance. While traditional systems can track inventory levels, they may not be capable of predicting supply chain disruptions or optimizing procurement using real-time market data. In contrast, an agentic system can analyze live inputs like stock fluctuations, consumer demand, and external environmental factors to proactively adjust procurement strategies and reroute supply chains in case of disruptions.

Experts should explore agentic systems in scenarios where adaptability and deep domain expertise drive impact. Take customer service, for instance. While traditional chatbots rely on rigid scripts, limiting their ability to handle nuanced inquiries, AI agents can engage in fluid, natural conversations, offer personalized support, and resolve issues with greater efficiency. Beyond customer interactions, AI-powered agents streamline workflows by automating repetitive tasks like drafting reports, composing emails, and even generating code. To unlock their full potential, professionals should integrate agentic systems into well-defined processes with clear success metrics, especially in domains demanding flexibility and resilience in decision-making.


Understanding the Key Components

DeepSeek-R1 powers advanced reasoning, CrewAI orchestrates agent collaboration, and Ollama streamlines local AI model deployment—together, they enable intelligent, adaptable automation.

DeepSeek-R1

DeepSeek-R1 is a powerful Mixture of Experts (MoE) model optimized for reasoning, problem-solving, and coding tasks. While it consists of 671 billion parameters, only 37 billion are active per inference, allowing efficient execution on modern GPUs.

CrewAI

CrewAI is an orchestration framework that allows multiple AI agents to collaborate on complex tasks. It enables structured workflows where AI agents specialize in different roles, enhancing task efficiency and scalability.

Ollama

Ollama is a lightweight, user-friendly framework designed to run and manage large language models (LLMs) locally with minimal setup. Unlike traditional model-serving solutions, Ollama provides a seamless out-of-the-box experience, abstracting away complex configurations and infrastructure dependencies. It’s particularly appealing to developers and researchers who want to deploy models without the overhead of cloud services or deep ML expertise.

Compared to vLLM, which is optimized for high-throughput, GPU-based inference using PagedAttention, Ollama prioritizes ease of use and accessibility. While vLLM is a powerhouse for enterprise-grade applications needing maximum inference speed and efficiency, Ollama is designed for local experimentation, rapid prototyping, and lightweight deployment; ideal for running models on personal machines or edge devices.

Unlike competitors like TGI (Text Generation Inference) and vLLM, Ollama bundles model management, serving, and optimization into a single package, removing the need for external orchestrators. While it may not match vLLM’s efficiency at scale, its simplicity makes it a compelling alternative for individuals and small teams looking to experiment with LLMs locally, without the complexity of Kubernetes or specialized hardware.

Feature Ollama vLLM TGI LM Studio
Ease of Use ⭐⭐⭐⭐⭐ (Plug & Play) ⭐⭐⭐ (Requires Setup) ⭐⭐⭐ (Requires Setup) ⭐⭐⭐⭐⭐ (GUI-Based)
Performance ⭐⭐⭐ (Good for Local Use) ⭐⭐⭐⭐⭐ (Optimized for GPUs) ⭐⭐⭐⭐ (Efficient Serving) ⭐⭐⭐ (Local Use Only)
Model Serving Local Inference + CLI High-speed Distributed Serving Optimized for Cloud & Local Local Inference Only
Hardware CPU & GPU Supported GPU Required (PagedAttention) CPU & GPU Supported CPU & GPU Supported
Use Case Local Testing & Prototyping Scalable Inference for Production Cloud & Edge AI Deployment Offline Model Interaction
Streaming ✅ Yes ✅ Yes ✅ Yes ✅ Yes
Fine-Tuning ❌ No ✅ Partial Support ✅ Partial Support ❌ No
Model Download ✅ Automatic ❌ Manual ❌ Manual ✅ Automatic
Best For Developers & Individuals Large-Scale AI Applications Cloud Deployments & APIs Offline Users & Beginners

In this implementation, we opt for Ollama due to its simplicity, ease of deployment, and local-first approach. Additionally, Ollama abstracts away the complexities of model management, handling downloads, serving, and optimizations automatically. This eliminates the need for manual environment setup, Kubernetes orchestration, or intricate API configurations, which are often required by vLLM or TGI. For this project, where we focus on local AI deployment, ease of integration with CrewAI, and rapid prototyping, Ollama provides the perfect balance of functionality and accessibility. It allows us to run DeepSeek-R1 on Ubuntu seamlessly while maintaining flexibility to scale or modify our setup in the future.

Prerequisites

Before setting up DeepSeek-R1 and CrewAI, ensure your system meets the following requirements:

Hardware Requirements

  • NVIDIA GPU with CUDA support (recommended: A100, 3090, 4090, or higher).
  • Minimum 16GB RAM (32GB or more preferred for large-scale tasks).
  • SSD storage for faster model loading and execution.

Software Requirements

  • Ubuntu (20.04 or 22.04) running on a local server, workstation, or WSL.
  • Python 3.8 or higher.
  • CUDA 11.6 or higher.
  • NVIDIA Drivers and cuDNN installed.
  • Git and venv for virtual environments.

Setting Up the Environment with venv

Step 1: Install Required Dependencies

Update your system and install necessary tools:

sudo apt update && sudo apt upgrade -y
sudo apt install -y python3-venv git sqlite3

Step 2: Install CUDA and cuDNN

Ensure that your system has CUDA and cuDNN installed:

nvcc --version

If CUDA is missing, install it using:

sudo apt install -y nvidia-cuda-toolkit

Step 3: Install Ollama

Ollama provides an easy way to run LLMs locally. Install it with:

curl -fsSL https://ollama.ai/install.sh | sh

Step 4: Download and Load DeepSeek-R1

Once Ollama is installed, pull the DeepSeek-R1 model:

ollama pull deepseek-r1

Start the DeepSeek-R1 model for local inference:

ollama serve

Step 5: Set Up Virtual Environment

Now, open a new terminal create and activate a virtual environment:

python3 -m venv deepseek-env
source deepseek-env/bin/activate

Step 6: Install Required Packages

Install necessary dependencies inside the virtual environment:

pip install --upgrade pip
pip install vllm crewai transformers torch accelerate kagglehub pandas sqlalchemy

Setting Up the SQLite Database from Kaggle Data

Download and store the dataset in an SQLite database:

import kagglehub
import pandas as pd
import sqlite3

# Download latest version of the dataset
path = kagglehub.dataset_download("smayanj/e-commerce-transactions-dataset")

print("Path: {}".format(path))

# path = "/root/.cache/kagglehub/datasets/smayanj/e-commerce-transactions-dataset/versions/1"

# Load dataset into Pandas DataFrame
df = pd.read_csv(f"{path}/ecommerce_transactions.csv")

print(df.head())

# Store data in SQLite database
conn = sqlite3.connect("ecommerce.db")
df.to_sql("transactions", conn, if_exists="replace", index=False)
conn.close()

Using CrewAI for AI-Driven Data Analysis

Now, let’s specify an LLM, define as many agents as needed with distinct roles and capabilities, and configure their reasoning, memory, and tool usage. Then, we will connect our AI agents to the SQLite database, enabling them to perform queries, retrieve insights, and generate structured reports for data-driven decision-making.

import requests
from sqlalchemy import create_engine, text
from crewai import Agent, Task, Crew, LLM

# Database connection setup
DATABASE_URL = "sqlite:///ecommerce.db"
engine = create_engine(DATABASE_URL)

def fetch_sales_data():
    with engine.connect() as connection:
        result = connection.execute(text("SELECT * FROM transactions WHERE Purchase_Amount > 100 ORDER BY Transaction_Date DESC LIMIT 10"))
        df = pd.DataFrame(result.fetchall(), columns=result.keys())
    return df

llm = LLM(
    model="ollama/deepseek-r1",
    temperature=0.2,
    base_url="http://localhost:11434/api/generate",
    api_key="NA"
)

analyzer = Agent(
    name="Data Analyst",
    role="Analyzes sales data trends",
    goal="Provide insights on sales patterns",
    backstory="An experienced data analyst specializing in e-commerce transactions.",
    llm=llm,
    verbose=True
)

reporter = Agent(
    name="Report Generator",
    role="Generates structured reports from analysis",
    goal="Create easy-to-read reports based on data insights.",
    backstory="A seasoned report creator with a knack for summarizing complex data.",
    llm=llm,
    verbose=True
)

data_analysis_task = Task(
    agent=analyzer,
    description="Perform analysis on e-commerce transactions",
    expected_output="Summary of sales trends and insights.",
    prompt="Analyze the following sales transactions: " + fetch_sales_data().to_string()
)

report_task = Task(
    agent=reporter,
    description="Generate a report summarizing sales analysis",
    expected_output="A structured report with key takeaways and insights.",
    prompt="Summarize the insights from the analysis into a structured report."
)

crew = Crew(agents=[analyzer, reporter], tasks=[data_analysis_task, report_task])
crew.kickoff()

Testing a Single Agent in CrewAI

Usually before implementing a complex workflows, it’s a good idea to test the fundamental components of your CrewAI setup.

Method 1: Run the Task Separately

Instead of running the full crew, execute a specific task:

result = task_1.execute()
print(result)

Method 2: Invoke the Agent Directly

You can test the agent’s response to a prompt:

response = agent_1.execute("What are the top sales trends in the dataset?")
print(response)

Method 3: Use Debug Mode

Enable logging for debugging:

import logging
logging.basicConfig(level=logging.DEBUG)

# Run the agent on a test query
response = agent_1.execute("Test input for debugging.")
print(response)

Method 4: Interactive Debugging

Manually test inputs in an interactive loop:

while True:
    user_input = input("Enter test input (or 'exit' to quit): ")
    if user_input.lower() == "exit":
        break
    response = agent_1.execute(user_input)
    print(response)

Try different inputs to verify the agent’s behavior. Running this test helps ensure that your Crew is correctly configured and responsive before you move on to more advanced tasks.


Conclusion

Deploying DeepSeek-R1 and CrewAI in a cloud-agnostic setup on Ubuntu using Ollama provides scalability, security, and flexibility without relying on proprietary cloud platforms. By leveraging Ollama for optimized inference and integrating AI-driven data analysis workflows, businesses can harness real-time insights from their own databases.

Whether running on on-prem hardware, private clouds, or hybrid environments, this approach ensures complete control over AI workflows. Try it today and unlock the full potential of agentic AI solutions in a cloud-independent manner.

In my upcoming post entitled “Essential Tools You Need to Know”, I’ll dive into the key tools that enable agentic AI to operate more effectively. From cutting-edge frameworks to essential libraries, I’ll explore how these technologies empower AI systems to make autonomous decisions, adapt to complex environments, and deliver smarter outcomes. Stay tuned!


Tags: #AgenticAI #DeepSeekR1 #UbuntuAI #vLLM #CloudAgnosticAI #AIDataAnalysis