The Codebase Steward: Automating Technical Debt Reduction in CI/CD with Qwen 3.5

Discover how to deploy AI agents using Qwen 3.5 and SWE-CI to autonomously fix bugs, refactor code, and reduce technical debt in your CI/CD pipeline.

Photo by Martin Martz on Unsplash

Technical debt is the silent killer of innovation. It starts as a small 'TODO' comment, a skipped unit test to meet a deadline, or a dependency left unupdated. Over time, this accrues interest, eventually forcing development teams to spend more time patching leaks than building the ship. For CTOs and engineering leads, the challenge has always been resource allocation: how do you justify pulling your best engineers off revenue-generating features to fix linting errors or refactor legacy modules?

Enter the Codebase Steward. By combining the reasoning capabilities of Large Language Models (LLMs) like Qwen 3.5-Coder with autonomous software engineering frameworks (SWE-CI), we are entering an era where technical debt is managed not by humans, but by intelligent agents running inside your CI/CD pipelines.

At Nohatek, we believe the future of development isn't about AI replacing programmers, but about AI acting as the tireless janitor, architect, and QA engineer that works while your team sleeps. In this guide, we explore how to architect a 'Codebase Steward' that autonomously identifies and remediates debt.

Beyond Autocomplete: The Rise of SWE Agents

a man walking across a street next to a cross walk — Photo by Polaron X on Unsplash

Most developers are familiar with AI coding assistants like GitHub Copilot. These are 'autocomplete' tools—they require a human driver to prompt them and accept suggestions. However, to tackle technical debt at scale, we need something more autonomous. We need Agents.

An SWE (Software Engineering) Agent differs from a chatbot in its ability to execute a loop of Thought, Action, and Observation. It doesn't just write code; it explores the repository, runs the terminal, reads file structures, and executes tests to verify its own work. The engine powering this logic is crucial. While proprietary models like GPT-4o have held the crown, the open-weights community has surged forward with models like Qwen 3.5-Coder.

The Qwen Coder series has demonstrated performance on benchmarks like SWE-bench that rivals proprietary giants, making it a cost-effective and privacy-conscious choice for enterprise self-hosting.

Why Qwen? For automated pipelines, latency and token cost (or GPU inference cost) are paramount. Qwen 3.5-Coder is optimized specifically for code generation, reasoning, and following complex instructions, making it the ideal brain for an agent that needs to understand the context of a massive monolithic repo without hallucinating non-existent libraries.

The Architecture: Wiring the Steward into CI/CD

black metal frame with glass roof — Photo by Chris Barker on Unsplash

How does a Codebase Steward actually function within a DevOps environment? It is not a standalone app, but a workflow triggered by specific events in your CI/CD pipeline (GitHub Actions, GitLab CI, or Jenkins). Here is the architectural blueprint for an automated debt-reduction agent.

1. The Trigger: The agent is scheduled to run (e.g., every Sunday night) or triggered by specific static analysis failures (e.g., a SonarQube alert).

2. The Environment: The CI runner spins up a sandboxed environment (Docker container) containing the repo, the build tools, and the SWE-Agent runtime powered by Qwen 3.5.

3. The Task: The workflow feeds the agent a specific issue. For example: 'Run the linter, identify all PEP8 violations in the /src directory, and fix them.'

4. The Loop:

The Agent runs the linter and captures the output.
It opens the offending files.
It generates a patch using Qwen 3.5.
It runs the project's test suite to ensure no regressions were introduced.
If tests fail, it self-corrects and retries.

5. The Pull Request: Once verified, the Agent pushes a branch and opens a Pull Request (PR) for human review.

Here is a conceptual example of how a GitHub Action step might look:

name: Codebase Steward Nightly
on: schedule
jobs:
  refactor:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run Qwen Steward
        uses: nohatek/swe-agent-action@v1
        with:
          model: 'qwen-3.5-coder'
          task: 'Increase unit test coverage for utils.py'
          github_token: ${{ secrets.GITHUB_TOKEN }}

Practical Use Cases: What Can the Steward Fix?

Code on a computer screen. — Photo by Rob Wingate on Unsplash

Implementing an AI agent sounds futuristic, but the value lies in boring, repetitive tasks. By offloading low-leverage work to Qwen 3.5, you free up your senior engineers to focus on architecture and business logic. Here are three high-impact use cases we see gaining traction:

1. Dependency Updates & Migration
Dependabot tells you a library is outdated, but it doesn't fix the breaking changes in your code. An SWE Agent can read the changelog of the updated library, refactor your function calls to match the new API, run the tests, and present a clean PR.

2. Docstring and Type Hint Generation
In dynamically typed languages like Python or JavaScript, lack of type safety is a major source of bugs. You can task the Steward with: 'Iterate through the codebase and add TypeScript interfaces or Python type hints to all public functions.' This improves code readability and IDE intelligence instantly.

3. Test Coverage Expansion
Low test coverage makes refactoring risky. You can instruct the agent to identify files with under 50% coverage and generate edge-case unit tests. Because Qwen 3.5 has been trained on millions of test files, it excels at writing standard assertions and mocking external services.

Nohatek Insight: Start small. Do not ask the agent to 'refactor the backend.' Ask it to 'replace all deprecated API calls in module X.' specific, scoped tasks yield the highest success rates.

Governance: Trust but Verify

trust spelled with wooden letter blocks on a table — Photo by Ronda Dorsey on Unsplash

The primary concern for CTOs regarding AI-generated code is safety. What if the AI introduces a security vulnerability or a subtle logic bug? This is why the 'Steward' model is built on a Human-in-the-Loop philosophy.

The AI Agent never commits directly to the main or production branch. It operates strictly on feature branches and submits Pull Requests. This turns the AI into a junior developer: it does the heavy lifting, but a senior human engineer must review and approve the work.

Furthermore, by utilizing open models like Qwen 3.5 hosted on private clouds (or via secure APIs), companies maintain control over their IP. Unlike public chatbots where data might be used for training, a self-hosted agent ensures your proprietary code never leaves your controlled infrastructure.

The era of manual technical debt management is drawing to a close. By integrating agents powered by models like Qwen 3.5 into your CI/CD pipelines, you transform your codebase from a decaying asset into a self-healing ecosystem. The Codebase Steward doesn't get tired, doesn't mind writing unit tests, and scales instantly with your cloud infrastructure.

At Nohatek, we specialize in building these bespoke AI automation pipelines for enterprises. Whether you need to modernize a legacy monolith or implement cutting-edge AI DevOps, our team is ready to help you architect the solution.

Ready to automate your technical debt? Contact Nohatek today to schedule a consultation.