Everyone around us is talking about MCPs all the time, and it’s time to understand the topic.

So, today we will deal with the basic concepts – “what it is in general”, then we will write our own “micro-MCP server”, and in the next post – something more real, about working with VictoriaLogs.

Contents

LLM Limitations

Any Large Language Model is a system “in itself”: it does not have access to external resources, and cannot perform any actions in the real world, in a real environment – for example, execute a shell command on your laptop or send an API request to GitHub or AWS.

Over time, the concept of “agents” emerged – local services that the model can call to perform such actions.

But then a new problem arose: the outside world and the number of services in it are infinite, and such agents cannot know everything about all services and how to interact with them.

That’s why a new standard, the Model Context Protocol (MCP), was eventually developed. It allowed to expand the capabilities of LLMs and agents through a single and standard way to describe the context of what the model will deal with.

And so… The MCP?

So, the Model Context Protocol is a protocol, a “schema” that describes the standard by which a model can interact with the external environment.

The specification itself can be found here – Specification.

To put it very simply, MCP is an “interface” through which LLM can perform some actions.

The scheme of MCP operation is client-server:

an MCP client through which we send a request in the form of natural language – “create a new Pull Request in my GitHub repository“
and an MCP server, which LLM calls to translate this request into the format “go to such and such URL, authenticate with such and such token, execute such and such API request“

The client can be any tool that can communicate with the LLM – a Cursor IDE, a mobile application, or even just a CLI utility.

And the server can be a service that our client can contact with a request.

The MCP architecture

Speaking in more detail, we have several components:

MCP Host: for example, Cursor or Windsurf – receives a request from a user, generates a structured MCP request (a function or a tool call), and sends it to the MCP client
MCP Client: LLM (or AI agent) itself + its interpreter (runtime, tool router), and together they receive a request from the MCP Host, determine which tool should be used, execute this call (via MCP Server), and return the result
MCP Server: a service that provides one or more tools for the MCP Client and executes requests from the MCP Client – for example, runs shell commands
Data Sources and Remote Services: the actual things with which the MCP server will communicate directly – logs, databases, API servers

The request execution flow can be defined as follows:

User -> MCP HostMCP
- Host -> MCP Client (LLM/agent processes the request, determines the tool)
  - MCP Client -> MCP Server (uses the tool)
    - MCP Server -> Data Sources and Remote Services (receives data, generates a response)
  - MCP Server -> MCP Client
- MCP Client -> MCP Host
MCP Host -> User

Documentation – Core architecture.

MCP components

So, MCP uses a client-server model and describes three key components (or primitives):

Resources: data to be accessed – logs, metrics, database, API responses (docs)
Prompts: templates or forms for submitting requests to the LLM – define how we formulate questions so that the model better understands which function (tool) to call (docs)
Tools: functions that are available on the MCP server for the MCP client to call, and that the model or agent calls after analyzing the user’s request (docs).
- such functions can be functions, for example, in Python, as well as API endpoints or shell commands

MCP transports

For communication between the client and the server, MCP defines Transports – communication channels through which requests and responses are transmitted.

Currently, there are three main types (MCP is still under active development, so new ones may be added):

stdio: standard stdin/stdout streams, used when client and server are running locally
SSE (Server-Sent Events): a one-way channel from the server to the client to transmit data with the results of the request in the form of events SSE
- can be implemented as stream-like data transfer – that is, the transfer of large responses in small chunks (parts), or as the return of one event in one message
- in that case, the client uses standard HTTP POST to send the request itself
Streamable HTTP: a two-way channel in which the client receives a response from the server via HTTP streaming

Documentation – Transport layer.

RAG vs MCP

But we already have Retrieval-Augmented Generation, RAG, right? Do we need a new tool?

Well, the RAG searches for information in external environments and returns context and data to the model.

And with MCP, the model performs actions: searching for information, launching a Docker container on a laptop, etc.

Creating an MCP Server

Okay – the basic concepts are understood, now let’s try to create our own MCP and connect it to some IDE.

I’m going to use Windsurf, because for me, it’s somehow easier to use with MCP.

MCP Server in Python

We will write in Python using the Python SDK.

Create a directory and activate the virtual environment:

$ mkdir -p MCP/my-mcp-server
$ cd MCP/my-mcp-server
$ python -m venv .venv
$ . .venv/bin/activate

Install the libraries:

$ pip install mcp mcp[cli] requests

Let’s write the code:

#!/usr/bin/env python3

from mcp.server.fastmcp import FastMCP

# instantiate an MCP server client
mcp = FastMCP("My MCP Tools")


# Register a tool
@mcp.tool()
def add(a: int, b: int) -> int:
    """Add two integers and return the result"""
    return a + b


if __name__ == "__main__":
    mcp.run(transport="stdio")

Here:

FastMCP: a library for creating MCP servers that implements the Protocol Model Context specification
- from mcp.server.fastmcp import – it’s included in the Python SDK
tools: functions that LLM can use – see Tools
run: method to start the FastMCP server – see Running Your FastMCP Server

Using the MCP Inspector

There is a very cool thing for debugging MCP servers – the Inspector. See the documentation here>>>. Requires Node.js >= 18 on the system.

Run it with npx:

$ npx @modelcontextprotocol/inspector python3 mcp_server.py
Starting MCP inspector...
⚙ Proxy server listening on port 6277
🔍 MCP Inspector is up and running at http://127.0.0.1:6274 
...

Ppen http://127.0.0.1:6274 in the browser, and in the Tools we can see our tool add:

Adding the MCP Server to Windsurf

Since we have the mcp library installed in the Python virtual environment, we need the full path to use it in the IDE.

In the terminal where we have the venv activated, run:

$ realpath .venv/bin/mcp 
/home/setevoy/Scripts/Python/MCP/my-mcp-server/.venv/bin/mcp

The MCP configuration file for Windsurf is ~/.codeium/windsurf/mcp_config.json.

Or simply open Windsurf Settings:

Click Add Server > Add custom server:

And it will open the mcp_config.json file with an example of adding an MCP server:

Let’s add ours:

{
  "mcpServers": {
    "my-mcp-server": {
      "command": "/home/setevoy/Scripts/Python/MCP/my-mcp-server/.venv/bin/mcp",
      "args": [
        "run",
        "/home/setevoy/Scripts/Python/MCP/my-mcp-server/mcp_server.py"
      ]
    }
  }
}

We return to Settings, click Refresh – and we should get our new server, which has one tool – add:

And it should appear in the chat window:

Let’s try to use it:

Yay! It works!

The LLM (in the case of Windsurf, the default is Cascade) has determined that it has access to an MCP server that can perform the math operation add, and it has used it.

Nice.

In the next post, we will write our own MCP server to work with VictoriaLogs – just to see how it works in more detail, because the VictoriaMetrics team is already making their own server (actually, they already released two MCPs – mcp-victoriametrics and mcp-victorialogs).