Original Title: Thoughts on slowing the fuck down
Original Author: Mario Zechner
Translation: Peggy, BlockBeats

Editor's Note: As generative AI rapidly enters software engineering, the industry sentiment is shifting from "awe of capability" to "anxiety about efficiency." Writing too slowly, using it too little, or not automating thoroughly seems to create the pressure of being eliminated. However, as coding Agents truly enter production environments, some more practical issues begin to emerge: errors are magnified, complexity spirals out of control, systems gradually become incomprehensible, and the increase in efficiency does not proportionally translate into an enhancement of quality.

This article is based on frontline practice, providing a calm reflection on this wave of "agentic coding." The author points out that Agents do not learn from mistakes like humans do; in the absence of bottlenecks and feedback mechanisms, small problems can be quickly magnified; and within complex codebases, their limited perspective and recall capability further exacerbate the chaos in system structure. The essence of these issues lies not in the technology itself but in humans prematurely relinquishing judgment and control under anxiety-driven pressures.

Therefore, rather than falling into the anxiety of "whether we must fully embrace AI," it is better to recalibrate the relationship between humans and tools: letting Agents take on localized, controllable tasks while firmly retaining system design, quality assurance, and key decision-making in our hands. In this process, "slowing down" becomes a capability; it means you still understand the system, can make trade-offs, and maintain a sense of control over your work.

In an era of continually evolving tools, what might be truly scarce is not faster generation capability, but the judgment of complexity and the stability to choose between efficiency and quality.

The following is the original text:

The face of the turtle is my expression when observing this industry

About a year ago, coding Agents capable of helping you "complete an entire project from start to finish" began to emerge. Before that, there were tools like Aider and early Cursor, but they felt more like assistants rather than "agents." The new generation of tools is extremely attractive, and many people have spent a significant amount of their spare time doing projects they've always wanted to do but never found the time to complete.

I think there's nothing wrong with that. It's inherently joyful to create things in your spare time, and most of the time you don't need to pay much attention to code quality and maintainability. This also provides a pathway for learning a new tech stack.

During the Christmas holidays, Anthropic and OpenAI released some "free credits," drawing people in like a slot machine. For many, this was the first real experience of the "magic of Agents writing code." The number of participants has increased.

Now, coding Agents are beginning to enter production codebases. Twelve months later, we are starting to see the consequences of this "progress." Here are my current thoughts.

Everything has broken down

Although much of this is anecdotal, current software indeed gives a feeling of "ready to break at any moment." 98% availability is becoming the norm instead of the exception, even for large services. User interfaces are filled with all sorts of absurd bugs that the QA team should easily catch.

I admit that this situation existed before the emergence of Agents. But now, the problems are clearly accelerating.

We don’t see the true conditions within the companies, but occasionally some information leaks out, such as the rumored "AI caused AWS outage." Amazon Web Services quickly "corrected" the statement, but then immediately launched a 90-day reorganization plan internally.

Satya Nadella (Microsoft CEO) has recently emphasized that more and more code within the company is written by AI. Although there is no direct evidence, there is indeed a feeling: the quality of Windows is declining. Even from some blogs released by Microsoft itself, they seem to acknowledge this point.

Companies claiming "100% of the product's code is generated by AI" almost always turn out the most terrible products you can imagine. This is not aimed at anyone in particular, but issues like memory leaks measured in gigabytes, chaotic UI, incomplete features, frequent crashes... these are certainly not the "quality endorsement" they think it is, nor are they good examples of "letting Agents do everything for you."

Privately, you will increasingly hear that whether it's a big company or a small team, they are saying one thing: they have been driven into a dead end by "Agents writing code." Without code reviews, handing design decisions to Agents, and piling on unnecessary features— the outcome is naturally not good.

Why we shouldn't use Agents this way

We have nearly abandoned all engineering disciplines and subjective judgment, falling into a kind of "addictive" way of working: the only goal is to generate the most code in the shortest time, regardless of the consequences.

You are building an orchestration layer to command an army of automated Agents. You installed Beads but have no idea that it is essentially an "unremovable malware." Just because everyone online says "this is how it's done." If you don't do this, you're "doomed" (ngmi).

You are constantly self-consuming in a "nested iterative loop."

Look—Anthropic made a C compiler with a bunch of Agents; although there are still issues now, the next generation of models is surely going to fix it, right?

Now look—Cursor created a browser with a large group of Agents, and while it’s virtually unusable and still needs manual intervention now and then, the next generation of models will surely manage it, right?

"Distributed," "divide and conquer," "autonomous systems," "dark factories," "solving software problems in six months," "SaaS is dead; my grandma just made a Shopify with Claw" ...

These narratives sound really cool.

Of course, this method might "still work" for your side project that hardly anyone uses (including yourself). Maybe there is indeed some genius who can use this approach to create a non-junk, truly usable software product. If you are that person, I genuinely admire you.

But at least in the developer circles around me, I have yet to see a truly effective case of this method. Of course, maybe it's just that we're all too inexperienced.

Errors compound without learning, bottlenecks, and delayed explosions

The problem with Agents is that they make mistakes. This is not unusual; humans also make mistakes. It could just be some correctness errors that are easy to identify and fix, solidifying things further with a regression test. It could also be some code smells that linters can’t catch: an unnecessary method here, an unreasonable type there, some duplicate code, and so on. Individually, these are inconsequential; human developers also make such small errors.

But "machines" are not humans. After humans make the same mistake a few times, they usually learn not to repeat it—either they get scolded into waking up, or they correct it in a genuine learning process.

But Agents don’t have that learning capacity, or at least they are assumed not to. They will repeatedly make the same mistakes, and could even create quirky combinations of different errors based on their training data.

You can certainly try to "train" them: write rules in AGENTS.md to prevent them from making certain errors; design a complex memory system to let them query historical mistakes and best practices. This can be effective for certain types of specific problems. But the prerequisite is—you must first observe that they made this mistake.

The more crucial difference is that humans are bottlenecks, while Agents are not.

Humans cannot produce twenty thousand lines of code in just a few hours. Even if the error rate is not low, they can only introduce a limited number of mistakes per day, and the accumulation of these mistakes occurs slowly. Usually, when the "pain from errors" accumulates to a certain extent, humans (out of instinctual aversion to pain) will stop and fix things. Or a person might be replaced to fix them. In any case, the problems get addressed.

But when you're using a whole set of orchestrated Agents "army," there are no bottlenecks, and no "pain points." These originally trivial little mistakes will accumulate at an unsustainable pace. You have been removed from the loop and are even unaware that these seemingly harmless little issues have grown into a monstrous problem. By the time you truly feel the pain, it is often too late.

Until one day you want to add a new feature but find the current system architecture (essentially a heap of errors) cannot support the modification; or users start to complain wildly because the latest release has issues, even losing data.

Only then do you realize: you can no longer trust this code.

Worse yet, the thousands of unit tests, snapshot tests, end-to-end tests generated by the Agents are also no longer reliable. The only way left to determine "whether the system is functioning correctly" is through manual testing.

Congratulations, you have severely trapped yourself (and the company).

The merchants of complexity

You completely do not know what is happening within the system because you've handed over control to the Agents. And the Agents are fundamentally in the business of "selling complexity." They have seen a lot of poor architectural decisions in their training data, and they continue to reinforce these patterns during reinforcement learning. You let them design the system, and the results are predictable.

What you end up with is: an extremely complex system consisting of a hodgepodge of poorly imitated "industry best practices," all unrestrained before the problems spiraled out of control.

But the issues don't stop there. Your Agents do not share the execution processes, do not see the complete codebase, and do not understand the decisions made by you or other Agents earlier. Therefore, their decisions are always "local."

This directly leads to the issues mentioned earlier: excessive duplicate code, structures abstracted for the sake of abstraction, various inconsistencies. These problems accumulate and ultimately form an irredeemably complex system.

This is actually very similar to enterprise-level codebases written by humans. However, that kind of complexity is usually the result of years of accumulation: the pain is diffused across numerous individuals, each one not reaching the critical point of "must fix," the organizational tolerance is also very high, allowing complexity to "co-evolve" with the organization.

But with the combination of humans + Agents, this process gets greatly accelerated. Two people, along with a pile of Agents, can reach this level of complexity in a matter of weeks.

The recall rate of agentic search is very low

You might hope that Agents can "clean up the mess," helping you to refactor, optimize, and clean up the system. But the problem is: they can no longer do that.

Because the codebase is too large and the complexity too high, they can only ever see parts. It is not merely that the context window is not big enough, or that the long-context mechanisms fail in the face of millions of lines of code. The issues are more insidious.

Before Agents attempt to fix the system, they must first find all the code that needs modification, as well as any existing implementations that can be reused. This step is what we call agentic search.

How an Agent accomplishes this depends on the tools you provide it with: it could be Bash + ripgrep, it could be a queryable code index, LSP services, vector databases…

But regardless of the tools used, the essence remains the same: the larger the codebase, the lower the recall rate. And a low recall rate means: the Agents cannot find all relevant code and therefore cannot make correct modifications.

This is also why those initial "code smell" small errors appear; they didn’t find the existing implementations, thus duplicating efforts and introducing inconsistencies. Ultimately, these issues will continuously spread and compound, blossoming into an extremely complex "rotten flower."

So how do we avoid all of this?

How we should collaborate with Agents (at least for now)

Coding Agents are like sirens, drawing you in with their extremely fast code generation speed and that kind of "choppy yet occasionally stunning" intelligence. They often complete some simple tasks with astonishing speed and high quality. The real problems start when you get the idea—"This thing is too powerful, computer, do the work for me!"

Delegating tasks to Agents itself is not problematic. Good Agent tasks typically possess several characteristics: the scope can be well-defined and does not require understanding the entire system; the tasks are closed-loop, which means the Agent can evaluate results on its own; the output is not critical path, but rather some ad hoc tools or software for internal use that will not affect real users or revenue; or you simply need a "rubber duck" to assist your thinking— essentially taking your ideas and colliding them with the compressed knowledge of the internet and synthetic data.

If these conditions are met, then it is suitable to assign tasks to Agents, provided that you, as a human, remain the final quality gatekeeper.

For instance, using the auto-research method proposed by Andrej Karpathy to optimize application startup time? Great. But the prerequisite is you understand that the code it produces absolutely does not possess production usability. The reason why auto-research is effective is that you provided it with an evaluation function, allowing it to optimize around a specific metric (like startup time or loss). But this evaluation function only covers a very narrow dimension. Agents will confidently ignore all metrics not included in the evaluation function, such as code quality, system complexity, and in some cases even correctness—if your evaluation function itself has issues.

The core idea is quite simple: let Agents handle those tedious tasks that won’t teach you anything new or those exploratory efforts you don’t have time to try. Then you assess the results, picking out the genuinely reasonable and correct parts, and complete the final implementation. Of course, you can also leverage Agents for this last step.

But I want to emphasize: really, it’s time to slow down a bit.

Give yourself time to think about what you are really doing and why you are doing it. Give yourself a chance to say "no," "no, we don’t need this." Set a clear limit for the Agent on how much code it can generate daily; this amount should match your actual review capacity. All parts that determine the "overall shape" of the system—such as architecture, API, etc.—should be written personally by you. You can use autocompletion to get a sense of "hand-written code," or you can pair program with the Agent, but the key is: you must be involved in the code.

Because writing code yourself or watching it being built step by step brings a kind of "friction." It is this friction that helps you understand more clearly what you really want to do, how the system works, and the overall "feel." This is where experience and "taste" play a role, and this is precisely what the current most advanced models cannot replace. Slowing down and enduring a bit of friction is your way of learning and growing.

Ultimately, you will end up with a system that remains maintainable—at least it won’t be worse than it was before Agents appeared. Yes, the previous systems were not perfect. But your users will thank you because your product is "usable," not a pile of rushed garbage.

Your features will be fewer but more precise. Learning to say "no" is itself a capability. You can also sleep soundly because you at least still know what is happening within the system, and you still hold the initiative. It is this understanding that allows you to compensate for the recall issues of agentic search, making the Agents' outputs more reliable and requiring less patching.

When the system has problems, you can personally step in to fix them; when the design is flawed from the start, you can also understand the underlying issues and refactor it into a better shape. Whether or not there are Agents becomes less important.

All of this requires discipline. All of this cannot do without humans.

[Original Link]

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

Slowing down is the answer of the Agent era.

Everything has broken down

Why we shouldn't use Agents this way

Errors compound without learning, bottlenecks, and delayed explosions

The merchants of complexity

The recall rate of agentic search is very low

How we should collaborate with Agents (at least for now)

Selected Articles by 律动BlockBeats

Table of Contents

Related Articles