Why I Work This Way

Brett Reynolds · July 2026 · On speed, infrastructure, and agentic research tools

← Back to essays


In late 2025, something in my work changed. From the outside, it may have looked like a burst of inspiration, but it didn’t feel like one. It felt like a machine finally catching. Projects that had been half-formed, stuck, or merely suspected began to move, and new ones started appearing faster than the usual academic tempo has words for.

Calling this productivity misses the point. Productivity is what people can count from the outside, and academics aren’t short of things to count. What actually changed was the method: I’d finally built a way of doing science and philosophy that matched the shape of the problems I care about.

I keep returning to categories that work without being clean: grammaticality, countability, definiteness, interjections, reciprocals, personhood, language, moral status, model behaviour, scientific kinds. In each case, the label is also a promise of further inference. Call something a noun, a grammatical sentence, an Indigenous people, a hallucination, a diagnostic, or a kind, and you’ve promised predictions: they hold somewhere, they break somewhere, and some field is relying on them for some purpose.

So the thread running through the recent sprint isn’t “apply the same theory everywhere”. It’s one working question, asked over and over: what stable inferences does this category support, and what has to be demoted, split, or re-described once an inference fails?

Questions like that are perishable, and that’s why the method matters.

Academic fields are excellent at forgetting their live problems. They rarely erase them; they absorb them into custom. A diagnostic becomes “standard”. A boundary case becomes an exception that generations learn to step around. A bad definition becomes harmless because everyone silently knows how to repair it, and eventually the repair gets mistaken for the thing itself. The field looks orderly because its unanswered questions have become dependencies.

The xkcd “Dependency” cartoon matters to me as more than a software joke. It shows a vast digital edifice balanced on one small block of thankless maintenance. Many academic concepts sit in that Nebraska position, usually without the courtesy of a README. A paper, a syllabus, a journal remit, and a local literature can rest on some old distinction nobody really wants to inspect. It goes on functioning because enough people know how to route around its failures. And once you notice the wobble, you have to move quickly, because fields are good at re-stabilizing around their failure points.

Speed matters because slow work has a cost academics rarely count. Patience, source discipline, attention to objections, and the humility of not outrunning the evidence are real virtues. I need all of them. But the moment a problem first becomes visible is often the moment before it gets domesticated.

These connections can vanish if I don’t write while they’re live: a mismatch between a diagnostic and the category it’s supposed to measure, an analogy between grammatical boundary cases and AI-evaluation categories, or a hidden reliance on expert judgment inside a supposedly objective method. Worse, they can become obvious in the bad way, absorbed as background mood before anyone has made them into an argument.

The workspace is part of the method. Folders, status notes, source-grounding rules, and review boards do more than help me write faster; they let the work keep moving. They keep more live threads open than unaided memory can hold. They let a conjecture in one folder become a pressure test in another. They make it possible to return to a half-seen problem before it decays into “something I was going to think about”. And they record decisions, false starts, objections, and handoffs, so the work can be cumulative rather than episodic.

Codex and Claude Code changed the unit of work. Agentic systems could work inside the same project structure I work in: reading local instructions, searching the archive, editing source, running checks, and leaving a task in a state that another agent or future me can resume. They turn the workspace from a place where finished thoughts are stored into a medium where unfinished thoughts can be acted on.

Models still matter. I use them as drafting aids, critics, indexers, sparring partners, and sometimes as readers without my local commitments. But the method changed when the system could hold the project context, touch the files, run the verification, and make the next move accountable to the local record. The agent isn’t there to certify claims. It’s closer to a research assistant embedded in the lab notebook, except the notebook has a shell prompt and occasionally opinions.

I don’t want to overstate the comparison. These systems are trained on the very literatures whose assumptions I’m often trying to unsettle. They can make a weak synthesis feel smoother than it deserves, and they can reward the internal coherence of a program before any external evidence has arrived. A fast system with fluent assistants can become a machine for making redescription feel like discovery, which isn’t a machine academia needed more of.

The danger has to stay visible.

No amount of slowing down will make the risk disappear. I handle it by building anchors: sources before claims, experiments where experiments are needed, corpus checks where corpus claims are being made, collaborators who don’t share my priors, reviewers who aren’t invested in the system, public preprints anyone can inspect, and project files that preserve the difference between what’s argued, what’s conjectured, what’s merely suggestive, and what’s still waiting for a severe test.

Raw speed isn’t the goal. Speed only helps if the work can still be stopped.

Fertility is evidence, but only of a certain kind. A framework that helps generate papers across linguistics, philosophy of science, AI evaluation, social ontology, and combinatorics has shown reach and heuristic power; it can make hidden structure visible. But reach isn’t confirmation, and a lens that can redescribe everything hasn’t thereby explained everything. At some point the framework has to make distinctions that could have failed: a diagnostic that should misbehave here and not there, a category whose projectibility should improve under one measurement model and collapse under another, a rival analysis that predicts the wrong pattern.

I now think of the current work in two layers. The first is expansion: find the cases, map the categories, make the analogies explicit, and show that projectibility (roughly, a category’s power to support reliable inference) is a live axis across domains. That was the sprint. The second is pressure: figure out where the vocabulary changes the verdict, where it distinguishes rivals, and where it predicts a result that may not appear. That’s the next discipline.

A journal article isn’t always the natural unit for any of this. Journals want bounded contributions with disciplinary addresses, and much of this work is undisciplinary before it’s interdisciplinary: philosophy of science pressed into linguistics, grammatical diagnostics treated like measurement instruments, AI-evaluation vocabulary treated as kind vocabulary. An article can extract a piece of that, and sometimes it should. But the program lives in the larger pattern: the book, the linked papers, the public notes, the code, the experiments, the retargeted failures, the vocabulary that accumulates across examples.

Self-theorizing follows from the same point. A method that uses agentic coding systems, file systems, project logs, and cross-model criticism to study projection, category stability, and AI evaluation is inevitably partly about itself. That circularity would be sterile if it became self-congratulation, but it can also be informative. Scientific work has always depended on instruments, institutions, inscription systems, and habits of attention, and when the instrument changes, the possible questions change. Pretending otherwise would be less honest, not more.

Tooling doesn’t guarantee better work; it changes what can be kept active and revised. It changes what can be held in view at once, how quickly a conjecture can be externalized, how many neighbouring literatures can be checked before an argument hardens, how easily a paper can be revised after a rejection, and how often a half-formed objection gets caught before publication. Get the fundamentals right and different questions open up; different gaps become tractable.

In December 2025, the parts were finally in place: the intellectual program, the writing system, the model workflow, the project-management layer, the agentic tools, and the available time. The result wasn’t simply more writing. A paper became less like a monument and more like an experimental probe: a way of asking whether a distinction travels, whether a diagnostic holds, whether a category supports the inferences people have been loading onto it.

The cost is that a probe is lighter than a monument. Some of these papers will be provisional, some too early, some extraction points from a book-length argument rather than stand-alone contributions, and some just wrong. But that’s what the method is for: getting enough structured attempts into the world that the live failures become visible.

I work this way because I don’t think philosophy or science advances only by waiting for the single perfect formulation. Sometimes it advances by building an engine that generates enough disciplined contact with the world for the right objections to appear.

Accountability is still the test. The work has to cite sources, survive review, invite criticism, produce tests, and let collaborators and readers say no. But it also has to move while the phenomenon is still unstable enough to teach us something. I don’t always get the balance right, but that’s the balance: speed to keep a question alive, and anchors to make sure it can still lose.