Syntax Highlighting for Code Blocks
This is part 4 of a series on building a terminal agent from scratch.
Previous parts: – Part 1: Building an Agent from Scratch – Part 2: The Conversation Loop – Part 3: Rendering Markdown in the Terminal
The complete source code is available on GitHub: https://github.com/dimiro1/agent-from-scratch/tree/main/04
In this part, we'll add syntax highlighting for code blocks. When LLMs return code snippets, they use markdown code blocks with a language hint like clojure or python. We'll parse these blocks and apply colors to keywords, strings, comments, and numbers.
Let's get started.
The Approach
Our syntax highlighting strategy is simple:
- Extract code blocks from the markdown (the part between triple backticks)
- Identify the language from the code fence (e.g., ```clojure)
- Apply regex-based highlighting for that language
- Return the colorized code
We're using regex for highlighting because it's straightforward and keeps our implementation simple. Production syntax highlighters use proper parsers, but regex works well enough for our purposes.
Creating the Code Namespace
Create src/termagent/code.clj:
(ns termagent.code
(:require [clojure.string :as str]
[termagent.ansi :as ansi]))
We'll build individual highlighters for each language, then combine them with a dispatcher function.
Highlighting Clojure
Let's start with Clojure since that's what we're writing:
(defn highlight-clojure [code]
(-> code
;; Strings (must come before keywords to avoid conflicts)
(str/replace #"\".*?\"" ; (1)
(fn [match]
(ansi/render [:green match])))
;; Comments
(str/replace #"(?m);.*$" ; (2)
(fn [match]
(ansi/render [:dim match])))
;; Keywords (Clojure :keyword syntax)
(str/replace #":[a-zA-Z][a-zA-Z0-9\-]*" ; (3)
(fn [match]
(ansi/render [:magenta match])))
;; Numbers
(str/replace #"\b\d+\.?\d*\b" ; (4)
(fn [match]
(ansi/render [:yellow match])))
;; Common functions
(str/replace #"\b(println|print|str|map|reduce|filter|conj|assoc|dissoc|get|first|rest|count|into|vec|list|seq|apply|partial|comp)\b" ; (5)
(fn [[match]]
(ansi/render [:blue match])))
;; Language keywords
(str/replace #"\b(def|defn|defn-|defmacro|let|if|when|cond|fn|loop|recur|do|for|doseq|require|ns|use)\b" ; (6)
(fn [[match]]
(ansi/render [:blue match])))))
Let's break down the patterns:
- Strings are matched with
\".*?\". The?makes it non-greedy, so it stops at the first closing quote - Comments use
(?m);.*$. The(?m)enables multiline mode so$matches end of each line - Keywords match the Clojure
:keywordsyntax - Numbers match integers and decimals with
\bword boundaries - Common core functions get highlighted in blue
- Special forms like
def,defn,letalso get blue
The order matters! We process strings first because they might contain characters that look like keywords or comments.
Adding More Languages
The pattern for adding new languages is the same: match strings first, then comments, then language-specific elements like keywords and numbers. The repository includes highlighters for Python, JavaScript, Java, Elixir, Lua, and Ruby. Each follows the same structure with language-specific regex patterns.
The Dispatcher Function
Now we need a function that picks the right highlighter based on the language:
(defn highlight
"Applies syntax highlighting to code based on the language"
[code language]
(case (str/lower-case (str/trim language))
"clojure" (highlight-clojure code)
"clj" (highlight-clojure code)
"python" (highlight-python code)
"py" (highlight-python code)
"javascript" (highlight-javascript code)
"js" (highlight-javascript code)
"java" (highlight-java code)
"elixir" (highlight-elixir code)
"ex" (highlight-elixir code)
"lua" (highlight-lua code)
"ruby" (highlight-ruby code)
"rb" (highlight-ruby code)
(ansi/render [:dim code]))) ; (1)
- For unknown languages, we render the code in dim (the ANSI code that makes text appear faded). This provides a visual distinction from regular text without breaking anything
Integrating with Markdown
Now we need to update our markdown renderer to handle code blocks. Update src/termagent/markdown.clj:
(ns termagent.markdown
(:require [clojure.string :as str]
[termagent.ansi :as ansi]
[termagent.code :as code])) ; (1)
- Add the new code namespace
Add the code block renderer:
(defn render-code-blocks [text]
(str/replace text
#"(?s)```(\w+)?\n(.*?)```" ; (1)
(fn [[_ language code-content]] ; (2)
(let [lang (or language "") ; (3)
highlighted (code/highlight code-content lang) ; (4)
lines (str/split-lines highlighted)
indented (map #(str " " %) lines)] ; (5)
(str "\n" (str/join "\n" indented) "\n")))))
- The regex matches fenced code blocks.
(?s)enables dotall mode so.matches newlines. We capture the optional language identifier and the code content - We destructure the match to get the language and code separately
- If no language is specified, we default to an empty string
- Apply syntax highlighting
- Indent each line with two spaces for visual separation
Update the render function to process code blocks first:
(defn render [text]
(-> text
render-code-blocks ; (1)
render-horizontal-rules
render-headers
render-blockquotes
render-tasks
render-lists
render-images
render-links
render-bold
render-strikethrough
render-italic
render-inline-code))
- Process code blocks first, before any other transformations. This prevents inline code rendering from interfering with code blocks
Testing the Highlighter
Let's test the Clojure highlighter:
$ clj -M -e "(require '[termagent.code :as code]) (println (code/highlight \"(defn greet [name] (println \\\"Hello\\\" name))\" \"clojure\"))"
You should see colorized output with: – Green strings – Dim comments – Blue keywords – Yellow numbers – Magenta Clojure keywords
Testing with the Agent
Now let's see it in action with our agent:
$ clj -M:run
User: Write a hello world function in Clojure
Assistant:
Here's a simple hello world function in Clojure:
(defn hello-world []
(println "Hello, World!"))
(hello-world)
This defines a function called `hello-world` that prints "Hello, World!" to the console.
The code block is now syntax highlighted with colors!
What We've Built
You now have: – A syntax highlighting system using regex patterns – Support for multiple languages (Clojure, Python, JavaScript, Java, Elixir, Lua, Ruby) – A dispatcher that selects the right highlighter based on language – Code block rendering integrated with markdown
To add more languages, create a highlight-<language> function following the same pattern and register it in the highlight dispatcher.
What's Next
Our terminal agent now renders markdown beautifully with syntax-highlighted code blocks. In the next part, we'll clean up the implementation and improve the user experience. After that, we'll add tool support so the agent can execute actions like reading files and running commands.
Hope to see you there!