An Ars Technica Reporter Blamed A.I. Tools for Fabricating Quotes in a Bizarre A.I. Story ⇥ theshamblog.com

On Thursday, Scott Shambaugh published a bizarre story about a rejected pull request from what seems to be an OpenClaw A.I. agent. The agent then generated a blog post accusing Shambaugh of “gatekeeping” contributions, and personally attacking him. After backlash in the pull request, the agent deleted its own post and generated an apology.

Allegedly.

The tale here is so extraordinary that it is irresponsible to take it at face value, as it seems the Wall Street Journal has. It seems plausible to me this is an elaborate construction of a person desperate to make waves. We should leave room for the — likely, I think — revelation this could be a mix of generated text and human intervention. That ambiguity is why I did not link to the original post.

A part of the subsequent reporting, however, has become a story just as interesting. Shambaugh, in a follow-up article:

I’ve talked to several reporters, and quite a few news outlets have covered the story. Ars Technica wasn’t one of the ones that reached out to me, but I especially thought this piece from them was interesting (since taken down – here’s the archive link). They had some nice quotes from my blog post explaining what was going on. The problem is that these quotes were not written by me, never existed, and appear to be AI hallucinations themselves.

This was disheartening news to learn. I like Ars Technica’s reporting; in the twenty-plus years I have read the site, I have found its articles generally careful and non-alarmist without pulling deserved punches. I cite it frequently here because I respect my readers, and I assume it does the same.

This revelation was upsetting, and the editor’s note issued by Ken Fisher perhaps even more so:

That this happened at Ars is especially distressing. We have covered the risks of overreliance on AI tools for years, and our written policy reflects those concerns. In this case, fabricated quotations were published in a manner inconsistent with that policy. We have reviewed recent work and have not identified additional issues. At this time, this appears to be an isolated incident.

Ars Technica does not permit the publication of AI-generated material unless it is clearly labeled and presented for demonstration purposes. That rule is not optional, and it was not followed here.

Fisher provides no additional detail about how fake quotes ended up in a published article. There are multiple parts of the reporting process that must have failed here to not only invent these statements, but also to escape any sort of fact-checking. There is no description of what steps will be taken to prevent this from happening in the future.

Fortunately, Benj Edwards, one of the story’s authors, posted a statement to Bluesky acknowledging he was the one who used A.I. tools that falsified these quotes:

Here’s what happened: I was incorporating information from Shambaugh’s new blog post into an existing draft from Thursday.

During the process, I decided to try an experimental Claude Code-based Al tool to help me extract relevant verbatim source material. Not to generate the article but to help list structured references I could put in my outline.

When the tool refused to process the post due to content policy restrictions (Shambaugh’s post described harassment). I pasted the text into ChatGPT to understand why.

This is a more specific explanation than the one offered by Fisher, but also opens its own questions. Why would Edwards need a Claude tool to summarize a not-particularly-long blog post? Why would he then jump to ChatGPT? Is this the first time Edwards used this tool, or is it an example of over-reliance that went horribly awry? And, again, is there no proof-reading process at Ars Technica to confirm that the quotations from source material are accurate and in-context?

This looks bad for Edwards, of course, though it seems like he has deep remorse. As bad of a screw-up as this is, I do not think it is worth piling on him personally. What I want from Ars Technica is an explanation of how this kind of thing will be prevented in the future. The most obvious is to prohibit all tools based on large language models or generative A.I. by its reporters. However, as technologies like these begin to power things as seemingly simple as spelling and grammar checkers, that policy will be difficult to maintain in the real world. Publications need better processes for confirming that, regardless of the tools used to create an article, the reporting is accurate.