Matt Mullenweg:

Just got word that the court dismissed several of WP Engine and Silver Lake’s most serious claims — antitrust, monopolization, and extortion have been knocked out! These were by far the most significant and far-reaching allegations in the case and with today’s decision the case is narrowed significantly. […]

It is hard to believe this absurd dispute has gone on for a year already. Mullenweg frames this as a win because, well, of course he does — and these are pretty serious allegations to have been dismissed. But as Margaret Attridge writes, at Courthouse News Service, the majority of claims have been allowed to stand:

U.S. District Judge Araceli Martinez-Olguin, a Joe Biden appointee, denied Automattic and Mullenweg’s bid to strike or dismiss claims including defamation, trade libel, unjust enrichment and intentional interference with contractual and economic relations claims.

These are the claims that concern, among other things, the hijacking of the Advanced Custom Fields plugin, which makes this next sentence Mullenweg wrote seem pretty rich:

[…] This is a win not just for us but for all open source maintainers and contributors.

Even if this case ends with a complete victory for Mullenweg and Automattic, his actions have shaken my support of — and faith in — the WordPress ecosystem. The original dispute is something I can understand. Mullenweg’s response, however, remains alarming. Actions do not need to be illegal for them to be wrong.

Tim Hardwick, MacRumors:

Apple says on its feature availability webpage that “Apple Intelligence: Live Translation with AirPods” won’t be available if both the user is physically in the EU and their Apple Account region is in the EU. Apple doesn’t give a reason for the restriction, but legal and regulatory pressures seem the most plausible culprits.

This is kind of a funny limitation because fully half the languages Live Translation works with — French, German, and Spanish — are the versions spoken in their respective E.U. countries and not, for example, Canadian French or Chilean Spanish. As written, the most impressive implementation of this feature only works if both parties are outside the E.U. or, if they are within the E.U., neither has an E.U. Apple Account. A U.K. or U.S. resident hoping to speak with a Parisian local? That is not going to work, but the reverse — presumably — would.

Because of its launch languages, I think Apple expects this holdup will not last for long.

Do note the extremely limited language selection at launch, too. Compare the supported Live Translation languages against the Priority Messages in Mail languages just below it on the feature availability page. Canadian English is apparently not worth urgently addressing.

Jeremiah Johnson, founder of the Center for New Liberalism, writing at his blog Infinite Scroll:

The third theory of Meta doesn’t describe the company as a laughable failure or a great success. This theory says that focusing on business results is beside the point when Meta’s creating something genuinely dark and sinister, something that perverts human nature and takes advantage of some of the most vulnerable people in society. I’m no expert in moral philosophy and ethics. But I feel pretty comfortable using the word evil to describe a company that impersonates real people without their permission in order to build AI sex bots that engage in sexual fantasies with children and lure senior citizens to their deaths.

The fourth theory of Meta is somehow even darker than the third.

As you can probably see, the theories get more cynical and, frankly, a touch conspiratorial. I am a firm believer in a modified version of Johnson’s first theory, which is that Meta is uncool and cringeworthy. That is not wrong, but I think it goes much farther. It is a deeply unfocused and uninteresting company. It jumps from one idea to another but, because it is still so dependent on ad revenue, everything must feed that machine. And personalized advertising is a fundamentally dull and kind of dirty thing no matter how much Meta wants to gussy it up in its marketing materials. That is not enough for Mark Zuckerberg, who is not hosting hour-plus livestreams before cheering crowds to show off ads. That would be a boring time for everyone. He wants the glory of hardware and platforms, but neither one is a meaningful part of what Meta actually does.

Last year, Robb Knight figured out how Perplexity, an artificial intelligence search engine, was evading instructions not to crawl particular sites. Knight learned that Perplexity’s engine would use an unlisted user agent to scrape summaries of pages on websites where Perplexity was blocked. In my testing, I found the summaries were outdated by hours-to-days, indicating to me the pages were not being actively visited as though guided by a user. Aravind Srinivas, CEO of Perplexity, told Mark Sullivan, of Fast Company, it was the fault of a third-party crawler and denied wrongdoing.

This dispute was, I think, a clear marker in a debate concerning what control website owners have — or ought to have — over access to and interpretation of their websites, an issue that was recently re-raised in an article by Mike Masnick of Techdirt. Masnick explores scraper gating services offered by Cloudflare and Reddit’s blocking of the Internet Archive, and concludes the web is being cleaved in two:

There are plenty of reasons to be concerned about LLM/AI tools these days, in terms of how they can be overhyped, how they can be misused, and certainly over who has power and control over the systems. But it’s deeply concerning to me how many people who supported an open internet and the fundamental principles that underlie that have now given up on those principles because they see that some AI companies might benefit from an open internet.

The problem isn’t just ideological — it’s practical. We’re watching the construction of a fundamentally different internet, one where access is controlled by gatekeepers and paywalls rather than governed by open protocols and user choice. And we’re doing it in the name of stopping AI companies, even though the real result will be to concentrate even more power in the hands of those same large tech companies while making the internet less useful for everyone else.

This is a passionately argued article about a thorny issue. I, too, am saddened by an increasingly walled-off web, whether through payment gates or the softer barriers of login or email subscriptions. Yet Masnick misses the mark in ways I think he is usually more careful about.

In the second quoted paragraph above, for example, Masnick laments an internet “governed [less] by open protocols and user choice” than “controlled by gatekeepers”. These are presented as opposing qualities, but they are in fact complementary. Open protocols frequently contain specifications for authentication, allowing users and administrators to limit access. Robots.txt is an open standard that is specifically intended to communicate access rules. Thus, while an open web is averse to centralization and proprietary technologies, it does not necessarily mean a porous web. The open web does not necessarily come without financial cost to human users. I see no reason the same principle should not be applied to robots, too.

Masnick:

This illustrates the core problem: we’re not just blocking bulk AI training anymore. We’re blocking legitimate individual use of AI tools to access and analyze web content. That’s not protecting creator rights — that’s breaking the fundamental promise of the web that if you publish something publicly, people should be able to access and use it.

Masnick is entirely correct: people should be able to access and use it. They should be able to use any web browser they like, with whatever browser extensions and user scripts they desire. That does not necessarily extend to machines. The specific use case Masnick is concerned with is that he uses Lex as a kind of editorial verification step. When he references some news sites, however, Lex is blocked from reading them and therefore cannot provide notes on whether Masnick’s interpretation of a particular article is accurate. “I’m not trying to train an A.I. on those articles”, Masnick writes. “I’m just asking it to read over the article, read over what I’ve written, and give me a sense” if they jibe.

That may well be the case, but the blame for mistrust lies squarely with artificial intelligence companies. The original sin of representatives of this industry was to believe they did not require permission to ingest a subset of the corpus of human knowledge and expression, nor did they need to offer compensation. They did not seem to draw hard ethical lines around what they would consume for training, either — if it was publicly available, it could become part of their model. Anthropic and Meta both relied on materials available at LibGen, many of which are hosted without permission. A training data set included fan-made subtitles, which can be treated as illicit derivative works. I cannot blame any publisher for treating these automated visitors as untrustworthy or even hostile because A.I. companies have sabotaged attempts at building trust. Some seem to treat the restrictions of a robots.txt file as mere suggestions to be worked around. How can a publisher be confident the user-initiated retrieval of their articles, as Masnick is doing, is not used for training in any way?

Masnick is right, however, to be worried about how this is bifurcating the web. Websites like 404 Media have explicitly cited A.I. scraping as the reason for imposing a login wall. A cynical person might view this as a convenient excuse to collect ever-important email addresses and, while I cannot disprove that, it is still a barrier to entry. Then there are the unintended consequences of trying to impose limits on scraping. After Reddit announced it would block the Internet Archive, probably to comply with some kind of exclusivity expectations in its agreements with Google and OpenAI, it implied the Archive does not pass along the robots.txt rules of the sites in its collection. If a website administrator truly does not want the material on their site to be used for A.I. training, they would need to prevent the Internet Archive from scraping as well — and that would be horrible consequence.

Of course, Reddit does not block A.I. scraping on principle. It appears to be a contractual matter, where third-parties pay the company some massive amount of money for access. Anthropic’s recent proposed settlement supposed a price of a billion-and-a-half dollars would sufficiently compensate authors of the books it pirated. M.G. Siegler called this “pulling up a drawbridge” by setting a high cost floor that will lock out insufficiently funded competitors. Masnick worries about the same thing, predicting the ultimate winners of this will be “the same large tech companies that can afford licensing deals and that have the resources to navigate an increasingly complex web of access restrictions”.

To be sure, intellectual property law is a mess, and encouraging copyright maximalism will have negative consequences. The U.S. already has some of the longest copyright protections in the world, and which have unfortunately spilled into Canada thanks to trade agreements. But A.I. organizations have not created a bottom-up rebellious exploration of the limits of intellectual property law. They are big businesses with deep pockets exploiting decades of news, blogging, photography, video, and art. Nobody, as near as makes no difference, expected something they published online would one day feed the machines that now produce personalized Facebook slop.

Masnick acknowledges faults like these in his conclusion, but I do not think his proposed solutions are very strong:

None of this means we should ignore legitimate concerns about AI training or creator compensation. But we should address those concerns through mechanisms that preserve internet openness rather than destroy it. That might mean new business models, better attribution systems, or novel approaches to creator compensation. What it shouldn’t mean is abandoning the fundamental architecture of the web.

The “new business models” and “better attribution systems” are not elucidated here, but the compensation pitch seems like a disaster in the making to me. It is also from Masnick; here is the nut of his explanation:

But… that doesn’t mean there isn’t a better solution. If the tech companies need good, well-written content to fill their training systems, and the world needs good, high-quality journalism, why don’t the big AI companies agree to start funding journalists and solve both problems in one move?

What Masnick proposes is that A.I. companies could pay journalists to produce new articles for their training data. Respectfully, this would be so insubstantial as to be worthless. To train their models, A.I. companies are ingesting the millions of websites, tens of millions of YouTube videos, hundreds of thousands of books, and probably far more — the training data is opaque. It is almost like a perverse version of fair use. Instead of a small amount of an existing work becoming the basis of a larger body of work — like the quotes I am using and attributing in this article — this is a massive library of fully captured information. Any single piece is of little consequence to the whole, but the whole does not work as well without all those tiny pieces.

The output of a single journalist is inconsequential, an argument Masnick also makes: “[a]ny individual piece of content (or even 80k pieces of content) is actually not worth that much” in the scope of training a large language model. This is near the beginning of the same piece he concludes by arguing we need “novel approaches to creator compensation”. Why would A.I. companies pay journalists to produce the microscopic portion of words training their systems when they have historically used billions — perhaps trillions — of freebies? There are other ways I can think of why this would not work, but this is the most obvious.

One thing that might help, not suggested by Masnick, is improving the controls available to publishers. Today marked the launch of the Really Simple Licensing standard offering publishers a way to define machine-readable licenses. These can be applied site-wide, sure, but also at a per-page level. It is up to A.I. companies to adhere to the terms but with an exception — there are ways to permit access to encrypted material. This raises concerns about a growing proliferation of digital rights management, bringing me back to Masnick’s reasonable concern about a web increasingly walled-off and accessible only to authorized visitors.

I am not saying I have better ideas; I appreciate that Masnick at least brought something to the table in that regard, as I have nothing to add. I, too, am concerned about dividing the web. However, I think publishers are coming at this from a reasonable place. This is not, as Masnick puts it, a “knee-jerk, anti-A.I. stance” to which publishers have responded with restrictions because “[i]f it hurts A.I. companies, it must be good”. A.I. companies largely did this to themselves by raising billions of dollars in funding to strip-mine the public web without permission and, ultimately, with scant acknowledgement. I believe information should be freer than it is, that intellectual property hoarding is wrong, and that we are better when we build on top of each other. That is a fine stance for information reuse by fellow human beings. However, the massive scale of artificial intelligence training comes with different standards.

In writing this article, I am acutely aware it will become part of a training data set. I could block those crawlers — I have blocked a few — but that is only partly the point. I simply do not know how much control I reclaim now will be relevant in the future, and I am sure the same is true of any real media organization. I write here for you, not for the benefit of building the machines producing a firehose of spam, scams, and slop. The artificial intelligence companies have already violated the expectations of even a public web. Regardless of the benefits they have created — and I do believe there are benefits to these technologies — they have behaved unethically. Defensive action is the only control a publisher can assume right now.

Apple launched a lineup of new iPhones, AirPods, and Apple Watches today, and many of the announcements seemed to have been leaked. This is typical of the September launches as the scale of the iPhone’s success means more of Apple’s resources are dedicated to its annual release cycle, which means there are more opportunities for leaks, and more incentive for publications to seize upon any tidbit of information. Sometimes, they get it wrong.

Juli Clover, in a pre-presentation rumour roundup for MacRumors:

There have been multiple rumors suggesting that the iPhone 17 Air won’t have the space for a SIM tray, which would prevent it from launching in China. iPhones sold in China are required to have a physical SIM tray, and carriers in the country do not support eSIM technology for smartphones.

The recent battery database leak mentions a variant of the iPhone 17 Air with a SIM tray, so it looks like information suggesting that the iPhone 17 Air won’t be available in China could be inaccurate.

It turns out the iPhone Air is available in China, and it is eSIM-only. In a piece following the presentation, Clover confirmed as much, explaining “China has a requirement that links a user’s ID to their cellular phone, something that’s harder to do with an eSIM over a physical SIM”. This is not, it turns out, synonymous with phones being “required to have a physical SIM tray”.

Chance Miller, 9to5Mac:

A last-minute leak from an anonymous account on X has led some iPad users to speculate that Apple might have a surprise in store for this week. A new M5 iPad Pro has been rumored to launch this year, but our expectation was that we wouldn’t see it until October or November. Now, however, it looks like we can’t rule out an announcement tomorrow alongside the iPhone 17.

The speculation in question comes from a fellow 9to5Mac writer who noticed it was the tenth anniversary of the iPad Pro’s introduction. That, combined with the details apparently shared by an anonymous account — but which were neither quoted nor summarized, nor even sourced — led Ryan Christoffel to connect some imaginary dots.

MacRumors was tipped-off in early July to some iPhone information by someone familiar with the details of an ad being created. Joe Rossignol:

The tipster revealed three alleged iPhone 17 Pro features that have not been rumored previously:

  • An upgraded Telephoto lens with up to 8× optical zoom, compared to up to 5× optical zoom on the iPhone 16 Pro models. The lens can apparently move, allowing for continuous optical zoom at various focal lengths.

  • An all-new pro camera app from Apple for both photos and videos. This app would compete with the likes of Halide, Kino, and Filmic Pro. It is unclear if the app would be exclusive to the iPhone 17 Pro models.

  • An additional Camera Control button on the top edge of the devices, for quickly accessing the camera and related settings. This would complement the Camera Control button on the bottom-right edge of all iPhone 16 models.

For the pro camera app, the tipster warned there is a chance Apple is planning a major update to its existing Final Cut Camera app instead of an all-new app.

This tipster was remarkably on-the-money for two of these three claims. Neither was rumoured prior and, though it barely made the presentation, Apple did update Final Cut Camera just as they claimed. Given that clear foreknowledge, I have to wonder what their observation of an “additional Camera Control button” is all about. It is not on these iPhones. Perhaps they got confused by the Action Button that has been present for a few years? I can only guess.

One final thing that only barely made the cut for advanced rumours and turned out to be entirely accurate. Ben Lovejoy, 9to5Mac:

Just hours ahead of the official announcement, a leaker has suggested that the iPhone 17 Air may instead simply be named the iPhone Air.

Mark Gurman called it the “iPhone Air” in an article a couple of weeks ago.

Models across the rest of the lineup are named variations of “iPhone 17”, so the number-less branding of this model is conspicuous. Perhaps Apple intends for it to only stick around a single year, or perhaps the entire line will lose version numbers.

Fred Vogelstein, of Crazy Stupid Tech, interviewed Techmeme founder Gabe Rivera on the occasion of its forthcoming twentieth anniversary:

But Techmeme looks and works exactly the same way as it always has. And it has never been more popular. Traffic is up 25 percent this year, likely driven by the explosion of interest in AI, Rivera says.

[…]

Now the software finds and ranks the stories. But the editors write the headlines. When stories are generated by corporate press releases/announcements, they choose which media outlet’s story is driving the most interesting social media conversations. The software also chews on the API feeds from the big social networks to come up with a list of the most useful conversations. Editors approve all those, however, to prevent duplication.

Since I learned about Techmeme in the late 2000s or early 2010s, I have admired many of its attributes. Its barely-changing design speaks to me, especially with its excellent use of Optima. More than anything, however, is its signature way of clustering related stories and conversations that keeps me coming back. That feature was one of the sources of inspiration for this very website. The differing perspectives are useful beyond a single story or topic; it has been a source of discovery for other websites and writers I should keep an eye on.

The steadiness of the site masks some of the changes that have been made over the years, however. Not too long ago, the community discussion section of any topic was merely a list of tweets. However, since about 2023, I think, it has also incorporated posts from other social networks and message boards. This is a welcome improvement.

Silicon Valley trends may come and go, but I hope Techmeme continues to evolve at its present geological pace.

At the time I wrote about the fundamentally dishonest complaints from Elon Musk about the App Store’s ranking of Grok, a lawsuit had not been filed. Two weeks later, though, Musk followed through.

Annie Palmer, CNBC:

Elon Musk’s xAI sued Apple and OpenAI on Monday, accusing the pair of an “anticompetitive scheme” to thwart artificial intelligence rivals.

The lawsuit, filed by Musk’s AI startup xAI and its social network business X, alleges Apple and OpenAI have “colluded” to maintain monopolies in the smartphone and generative AI markets.

Iain Thomson, the Register:

It accuses Apple of downgrading other AI apps in favor of ChatGPT. While the lawsuit acknowledges iPhones can use other AI engines, it claims that OpenAI competitors don’t get enough promotion.

The lawsuit cites the list of “Must-Have Apps” posted on Sunday, in which OpenAI was the only AI app listed. Also included were Tinder, Hinge, and Bumble. Musk’s lawyers claim that Cook & Co’s statement in the T&Cs that Apple’s store “is designed to be fair and free of bias,” is a lie.

There are many problems one can find in the App Store, Apple’s editorial process, and the way OpenAI seems to be everywhere. I think xAI is a bad plaintiff for this case, however. When I wrote that Musk’s frenzied posting on X was “dishonest”, what I meant was he was inventing or exaggerating controversy to boost the app’s rankings. At the time, it was unclear whether this strategy would work. On the day I published my commentary, Grok was fifth in the U.S. in overall free downloads. The day this lawsuit was filed, it had fallen off the chart, declining steeply since. Meanwhile, Google’s Gemini has climbed from placing in the mid-fifties in mid-August to third place today. Perplexity has grown from placing in the hundreds to twenty-fifth place today. (Sensor Tower does not allow me to create permalinks of those charts, so act fast.)

Of course, even though this information appears to invalidate the lawsuit’s claim (PDF) that “Apple has deprioritized the apps of super app and generative AI chatbot competitors, like the products offered by Plaintiffs, in its App Store rankings to favor OpenAI”, it will simply feed the persecution complex of xAI. And the lawsuit raises a good point: Apple should more urgently open up third-party A.I. integration, something it said it would do. This is going to be painful to watch.

Barry Schwartz, Search Engine Roundtable:

Google’s CEO, Sundar Pichai, said in May that web publishing is not dying. Nick Fox, VP of Search at Google, said in May that the web is thriving. But in a court document filed by Google on late Friday, Google’s lawyers wrote, “The fact is that today, the open web is already in rapid decline.”

This document can be found over here (PDF) and on the top of page five, it says:

The fact is that today, the open web is already in rapid decline and Plaintiffs’ divestiture proposal would only accelerate that decline, harming publishers who currently rely on open-web display advertising revenue.

This is, perhaps, just an admission of what people already know or fear. It is a striking admission from Google, however, and appears to contradict the company’s public statements.

Dan Taylor, Google’s vice president of global ads, responded on X:

Barry – in the preceding sentence, it’s clear that Google’s referring to ‘open-web display advertising’ – not the open web as a whole. As you know, ad budgets follow where users spend time and marketers see results, increasingly in places like Connected TV, Retail Media & more.

Taylor’s argument appears to be that users and time are going to places other than the open web and so, too, is advertising spending. Is that still supposed to mean the open web is thriving?

Also, if you actually read the filing, you will quickly see that Google clearly differentiates between “open web” — no hyphen, no qualifiers — and “open-web display”, with the latter explicitly referring to advertising. There is an entire section about the open web beginning on page 16, concluding with this paragraph:

The divestitures of AdX and DFP would risk accelerating the ongoing shift in spending away from open-web display inventory. Plaintiffs propose to require Google to divest its ad exchange and publisher ad server for open-web display advertising. But they acknowledge — as they must — that Google could continue operating an ad exchange and a publisher ad server for any other ad format. The outcome would be to incentivize Google to shift the resources it invests in serving open-web publishers to serving publishers who prioritize other formats, like app and CTV, as well as its non-open web properties such as YouTube. And divestiture will also eliminate the efficiencies of integration within Google’s ad tech stack, so that Google’s advertiser customers are likely to see a further decline in their return on investment from open-web display ads. Advertisers will vote with their feet and accelerate the existing trend of shifting spend to non-open web display ad formats. Automated AI-powered tools seeking greatest ROI will make that shift in spend even faster. In short, Plaintiffs’ remedies will harm publishers — particularly smaller publishers reliant on open-web display who have not diversified to other ad formats — by accelerating the decline of the open web.

In context, this sure looks to me like Google is arguing that forcing it to divest AdX and DoubleClick for Publishers will more negatively impact publishers without other advertising revenue streams, thereby worsening the open web. The “accelerating the decline” line is repeated here, though it is phrased ambiguously. This could be read in the way Schwartz has and the way many publishers are feeling — that the open web, as a whole, is in decline. Or it could be read the way Taylor insists Google has meant it, as accelerating the decline of open web advertising. If that is what Google meant, it would be better if it had phrased these references to advertising as clearly as it did in the rest of the document.

Peter Hoskins and Lily Jamali, BBC News:

A US federal court has told Google to pay $425m (£316.3m) for breaching users’ privacy by collecting data from millions of users even after they had turned off a tracking feature in their Google accounts.

The verdict comes after a group of users brought the case claiming Google accessed users’ mobile devices to collect, save and use their data, in violation of privacy assurances in its Web & App Activity setting.

Heck of a week for Google. Perhaps it should stop doing so much creepy and anticompetitive stuff.

This lawsuit was filed in July 2020 (PDF), and alleged various privacy violations surrounding Google’s “Web and App Activity” control — a known source of confusion — and Google’s data collection through other services like Firebase and Analytics. Perhaps Google should not operate such a sprawling empire of surveillance by both becoming a smaller business and doing less data collection. Alas, it will not do so voluntarily.

Teresa Ribera, the European Commission’s vice president of “Clean, Just, and Competitive Transition”:

Google abused its power by favouring its own online display advertising technology services to the detriment of its competitors, online advertisers and publishers.

As a result of Google’s illegal practices, advertisers faced higher marketing costs which they likely passed on to European consumers in the form of higher prices for products and services. Google’s tactics also reduced revenues for publishers, which may have led to lower service quality and higher subscription costs for consumers.

Google’s abusive behaviour therefore had a negative impact on all European citizens in their day-to-day use of the web.

This is illegal under EU competition rules and therefore our decision orders Google to pay a fine of €2.95 billion.

Jacob Parry, Politico:

Google now has until early November — or 60 days — to tell the Commission how it intends to resolve that conflict of interest and to remedy the alleged abuse.

The Commission said it would not rule out a structural divestiture of Google’s adtech assets — but it “first wishes to hear and assess Google’s proposal.”

Kevin Breuninger, CNBC:

President Donald Trump on Friday threatened to launch a trade investigation to “nullify” what he said were discriminatory fines being levied by Europe against U.S. tech firms such as Google and Apple.

A United States court has also found Google’s dominance of online advertising is an illegal monopoly and will begin arguing over what to do about it later this month. A different U.S. court’s resolution to the search monopoly trial earlier this week was not particularly substantial; its stock went up after the judge announced remedies. Perhaps the advertising case will play out differently in the U.S. but I have my doubts.

Ashley Belanger, Ars Technica:

Authors revealed today that Anthropic agreed to pay $1.5 billion and destroy all copies of the books the AI company pirated to train its artificial intelligence models.

In a press release provided to Ars, the authors confirmed that the settlement is “believed to be the largest publicly reported recovery in the history of US copyright litigation.” Covering 500,000 works that Anthropic pirated for AI training, if a court approves the settlement, each author will receive $3,000 per work that Anthropic stole. “Depending on the number of claims submitted, the final figure per work could be higher,” the press release noted.

Foster Kamer, Futurism:

Kyle Chayka, a staff writer at The New Yorker whose work zeroes in on the intersection between technology, art, and culture, is the author of not one but two books that popped up in LibGen: 2024’s “Filterworld: How Algorithms Flattened Culture” and 2020’s “The Longing For Less: Living With Minimalism.” Also in found in LibGen was the Italian translation of Filterworld. All in, he could stand to make upwards of $12K!

We asked Kyle: How does the sum of “$3,000 per class work” feel as a number given that his intellectual property was used to train an AI? Low, high, not worth it on principle, or about right?

“It should be a license, really,” he replied. “Because the training never goes away. So it could be $5,000 every 5 years, or $1,000 / year as long as they exist. But the price seems about right, honestly — a decent percentage of most book advances, and about the price of an institutional speaking gig.”

Yet another complication for the fair use arguments of generative A.I. companies, though one which was obviously undermined by using pirated data to begin with. Though I think it makes sense to focus on this case for now, the question looming over this entire case is what precedent it sets. It does not singlehandedly eliminate the fair use argument for training on public information, but what about other illicitly reproduced information ingested into data sets?

Meta and now Apple are being sued in similar cases. If you are a published author, visit the case settlement website to register for your share of the pot.

Update: The judge in the case is not thrilled with this proposed settlement.

Katherine Bunt, Meredith McGraw, and Megan Bobrowsky, Wall Street Journal:

President Trump on Thursday led leaders of the world’s biggest technology companies in a version of his cabinet meetings, in which each participant takes a turn thanking and praising him, this time for his efforts to promote investments in chip manufacturing and artificial intelligence.

Present at the table were Sam Altman, Tim Cook, Sundar Pichai, David Sacks, and — immediately to Trump’s right — Mark Zuckerberg. Bill Gates was also there for some reason. Here is a fun exchange the Journal pulled from all the grovelling:

Trump also addressed Alphabet CEO Sundar Pichai about a federal judge’s ruling this week on an antitrust case related to Google’s monopoly in search. The judge levied relatively light penalties and rejected the most significant measures sought by the Justice Department, which filed the lawsuit in 2020.

“You had a very good day yesterday,” Trump said. “Do you want to talk about that big day you had yesterday?”

“I’m glad it’s over,” Pichai said.

“Biden was the one who prosecuted that lawsuit,” Trump said. “You know that, right?”

Beginning this section by reminding readers the suit was filed under the first Trump administration is a kind way of calling out the president’s flexible concepts of time and responsibility.

At least nobody gave him any solid gold statues this time, as far as I know.

Rebecca Bellan, TechCrunch:

U.S. District Court Judge Amit P. Mehta outlined remedies on Tuesday that would bar Google from entering or maintaining exclusive deals that tie the distribution of Search, Chrome, Google Assistant, or Gemini to other apps or revenue arrangements. For example, Google wouldn’t be able to condition Play Store licensing on the distribution of certain apps, or tie revenue-share payments to keeping certain apps.

Mehta’s full decision (PDF) is written to be read by non-lawyers. Even so, I admit I have only read the introduction — which includes some high-level explanations of the remedies — and skimmed much of the rest. It is a couple hundred pages long for those who want to put in a little more work than I did.

I did notice a potentially interesting deposition referenced on page 21 regarding the relative performance of Google’s A.I. search summaries compared to organic search results. I sure wish a transcript would be published.

Lily Jamali, BBC News:

The judge will also allow certain competitors to display Google search results as their own in a bid to give them the time and resources they need to innovate.

The judge is allowing Google to continue to pay companies like Apple and Samsung for distribution of its search engine on devices and browsers, but will bar Google from maintaining exclusive contracts.

The latter must be quite the relief to Apple, which gets tens of billions of dollars a year because people use Safari to search with Google, and to Mozilla, which gets to keep existing.

Cory Doctorow:

And then there’s Google’s data. Google is the world’s most prolific surveiller, and the company boasts to investors about the advantage that its 24/7 spying confers on it in the search market, because Google knows so much about us and can therefore tailor our results. Even if this is true – a big if – it’s nevertheless a fucking nightmare. Google has stolen every fact about our lives, in service to propping up a monopoly that lets it steal our money, too. Any remedy worth the name would have required Google to delete (“disgorge,” in law-speak) all that data […]

Some people in the antitrust world didn’t see it that way. Out of a misguided kind of privacy nihilism, they called for Google to be forced to share the data it stole from us, so that potential competitors could tune their search tools on the monopolist’s population-scale privacy violations.

And that is what the court has ordered.

Much like the Microsoft antitrust case that preceded Google’s by a couple decades, the proposed solutions basically treat Google with kid gloves. The judge admitted in the introduction of treating this case with “humility”, having “no expertise in the business of [general search engines], the buying and selling of search text ads, or the engineering of GenAI technologies”. That is true enough. Yet judges are often expected to rule on cases with subject matter about which they have no particular expertise. The judge appears to be comfortable assuming the generative A.I. boom is providing Google with ample competition.

This conclusion ultimately seems as though Mehta is doing something, yet Google has to change very little. It is difficult for me to believe this will be disruptive to Google’s search monopoly. Again, as with the years-ago hesitancy to impose serious conditions on Microsoft or break it up, Google will probably emerge as an even stronger force while technically complying with a ruling finding its monopoly illegal.

Charlotte Tobitt, Press Gazette:

Wired and Business Insider have removed news features written by a freelance journalist after concerns they are likely AI-generated works of fiction.

Freedom of expression non-profit Index on Censorship is also in the process of taking down a magazine article by the same author after concerns were raised by Press Gazette. The publisher has concluded that it “appears to have been written by AI”.

Several other UK and US online publications have published questionable articles by the same person, going by the name of Margaux Blanchard, since April.

Tobitt reports at least six publications carried articles with the “Margaux Blanchard” byline. Wired, for its part, published an explanation that, unfortunately, does not make it look very good. For one thing, it is attributed to “Wired Management”, not any specific person. For another, the reason “Blanchard” was busted had nothing to do with the article itself:

Over the next several days, it became clear that the writer was unable to provide enough information to be entered into our payments system. They instead insisted on payment by PayPal or check. Now suspicious, a WIRED editor ran the story through two third-party AI-detection tools, both of which said that the copy was likely to be human-generated. A closer look at the details of the story, though, along with further correspondence from the writer, made it clear to us that the story had been an AI fabrication. After more due diligence from the head of our research desk, we retracted the story and replaced it with an editor’s note.

Wired says it did not fully vet the article before publishing, which seems a little bit obvious. This amount of transparency is admirable, however, in contrast with other publications that have merely replaced articles “by” “Blanchard” with a terse note.

This is obviously a huge mistake on the part of these media organizations. It is embarrassing and silly and all the rest of it, particularly for Wired which has ramped up its political coverage with an aggressive but accurate bent. I also cannot help but feel it is indicative of what is, for lack of better phrasing, our current media climate. Trust in mass media has been declining for years, in part because financial pressures have triggered staffing cutbacks while online news has encouraged faster and voluminous coverage, which means all of this is done with increasing sloppiness thereby feeding declining trust in mass media. I am not making excuses for Wired or Business Insider running an insufficiently checked piece; they should have followed protocols. But a little bit of this problem may be attributable to the corner-cutting that encourages publishing more stories more quickly.

Also, who the hell is behind this “Margaux Blanchard” character anyway?

Tobitt, in a followup:

An internet hoaxer calling themselves Andrew Frelon has claimed responsibility for the fake (likely AI-generated) freelance journalist Margaux Blanchard.

Andrew Frelon is itself a pseudonym and their claims have to be treated with a high degree of scepticism.

“Frelon” is the same person who, earlier this year, claimed to be responsible for the Velvet Sundown A.I. music hoax, only to reveal that statement, itself, was a prank and he had nothing to do with the Velvet Sundown.

Also, we apparently know Frelon’s real name, according to Kevin Maimann, of CBC News:

The Quebec man who pranked journalists and music fans by saying he was behind a wildly successful AI band has revealed his identity as web platform safety and policy issues expert Tim Boucher.

Boucher’s own telling reads to me like the ramblings of someone a little too self-indulgent and self-important. In other words, he could very well be responsible for hijacking the publicity around the Velvet Sundown gag, and attempting to do the same for this “Margaux Blanchard” situation. Do not simply be skeptical; assume this is false until Boucher or “Frelon” provides proof. Press ought to stop giving this doofus the publicity he appears to crave.

Tim Bradshaw and Anna Gross, Financial Times:

The new [Investigatory Powers Tribunal] filing prepared by two judges sets out the “assumed facts” on which the case will be argued at a court hearing scheduled for early next year.

[…]

However, the new IPT filing states the [technical capability notice] “is not limited to” data stored under ADP, suggesting the UK government sought bulk interception access to Apple’s standard iCloud service, which is much more widely used by the company’s customers.

It is routine for law enforcement to request access to individual iCloud accounts, and Apple says it complies to the best of its ability with legal requests. But “bulk interception access” would go well beyond these kinds of targeted requests and reverting to the kind of global surveillance apparatus made public in 2013. The more cynical reader might imagine such a system still exists regardless of operating systems and web browsers defaulting to secure connections in the intervening years. I have not seen evidence of this. I think the Times’ reporting supports the notion that intelligence services can no longer monitor these kinds of communications in bulk as they once did.

The filing also apparently throws cold water on Tulsi Gabbard’s claim that the U.K. is rescinding its demands for an Advanced Data Protection backdoor. Again, the secrecy around this prevents us from gaining specificity or clarity. It even requires the judges to rely on “assumed facts” which, as Bradshaw and Gross write, are “not the same as asserting that [they are] factually the case”, because they cannot confirm the existence of the technical capability notice. Insert your personal favourite dystopian literary reference here.

Emanuel Maiberg, of 404 Media, published a pretty good story with a pretty bad headline. Here is his core argument:

Porn piracy, like all forms of content piracy, has existed for as long as the internet. But as more individual creators who make their living on services like OnlyFans, many of them have hired companies to send Digital Millennium Copyright Act takedown notices against companies that steal their content. As some of those services turn to automation in order to handle the workload, completely unrelated content is getting flagged as violating their copyrights and is being deindexed from Google search. The process exposes bigger problems with how copyright violations are handled on the internet, with automated systems filing takedown requests that are reviewed by other automated systems, leading to unintended consequences.

Ignoring the titillating associations with porn, this is a new and valuable entry into the compendium of articles about failures in the automatic handling of DMCA complaints. The headline on the article, however, gives no indication of that and is, I think, misleading:

How OnlyFans Piracy Is Ruining the Internet for Everyone

Here are my problems with it:

  1. The piracy itself is not doing anything. It is the automation and mishandling of takedown requests for copyright claims.

  2. This is only slightly related to OnlyFans. It is more broadly applicable to the increasing appeal of solo or independent producers in music, video, podcasting, etc.

  3. If I am being pedantic, it is not the internet which is being ruined, but the web.

This headline is kind of clickbait, but it is also simply inaccurate in describing the subject of the article. I do not often flag issues with headlines, especially since they are typically written by editors instead of reporters. In this case, though, Maiberg is a co-owner of 404 Media, so I am sure he had some say in choosing a headline.

Do not let that criticism steer you away from what is otherwise an excellent article, however. Maiberg interviewed the CEO of Takedowns AI, a platform which used to be involved in generic material removal and reputation management before pivoting to a service focused on OnlyFans piracy specifically. I am linking to the Wayback Machine there because the company’s site is currently offline, perhaps its own attempt at reputation management after Kunal Anand, the CEO, explained what seems to be a loose approach to validating takedown requests.

This is an old problem, as Maiberg acknowledges:

It’s an issue at the intersection of several critical problems with the modern internet: Google’s search monopoly, rampant porn piracy, a DMCA takedown process vulnerable to errors and abuse, and now the automation of all of the above in order to operate at scale. No one I talked to for this story thought there was an easy solution to this problem.

I obviously have no answers here, only two observations. The first is that it shows a limitation of offloading legal processes to corporations. Fair use is, famously, a nebulous concept, and trying to figure out whether a single YouTuber’s video is in violation would take significant time and expense — and this simply is not feasible at YouTube’s scale. Second, automation has made some of this easier — it is harder to find full-length Hollywood films on YouTube than you might expect for a video-based website — while also requiring each party to more carefully check their work.

Earlier this month, Tesla was penalized by a jury when a car’s supervised autonomous vehicle features failed, leading to a collision. When I linked to the CBS News article about the story, this was one of several paragraphs that stood out to me:

The most important piece of evidence in the trial, according to the plaintiffs’ lawyers, was an augmented video of the crash that included data from the Autopilot computer. Tesla previously claimed the video was deleted, but a forensic data expert was able to recover it.

Now, thanks to Trisha Thadani and Faiz Siddiqui, of the Washington Post, we know more about what happened behind the scenes:

Years after a Tesla driver using Autopilot plowed into a young Florida couple in 2019, crucial electronic data detailing how the fatal wreck unfolded was missing. The information was key for a wrongful death case the survivor and the victim’s family were building against Tesla, but the company said it didn’t have the data.

Then a self-described hacker, enlisted by the plaintiffs to decode the contents of a chip they recovered from the vehicle, found it while sipping a Venti-size hot chocolate at a South Florida Starbucks. Tesla later said in court that it had the data on its own servers all along.

One supposed benefit of autonomous vehicle technologies is in public safety. That is how the simple features on my car are described and marketed, at least, the most basic of which is front-facing collision detection and automatic braking. Tesla’s system, among the most advanced on any car, failed in this case with tragic consequences. I am not saying my car would have performed any better. But I would view Tesla’s system differently if it began from a base of reliable safety features rather than the implications of a name like “Autopilot”.

One supposed benefit of ever greater data collection — even having several cameras constantly recording — is that we can better understand a collision. However, that only works as well as an automaker is trustworthy. It is hard to know what to make of Tesla’s defence here. Either it did not look very hard — which is bad — or the company actively avoided producing evidence until it became impossible for it to play dumb. I sure hope it is more compliant in future collision investigations. But I have no trust that it will be.

I worry that Tesla will learn the wrong lesson, and will instead be even more evasive. The media strategy at Elon Musk’s companies these days, including for articles about this crash, is to say nothing. Better for them to be silent when trust in media and institutions is perilously low. Not good for anyone else.

Update: Tesla is, of course, fighting the verdict.

Sarah Perez, TechCrunch:

The law, HB 1126, requires platforms to implement age verification for all users before they can access social networks like Bluesky. Recently, the Supreme Court justices decided to block an emergency appeal that would have prevented the law from going into effect as the legal challenges it faces played out in the courts. This forced Bluesky to make a decision of its own: either comply or risk hefty fines of up to $10,000 per user.

Users in Mississippi soon scrambled for a workaround, which tends to involve the use of VPNs.

However, others questioned why a VPN would be the necessary solution here. After all, decentralized social networking was meant to reduce the control and power the state — or any authority — would have over these social platforms.

Bluesky blocked access in Mississippi to avoid collecting more data about its users or risk stiff penalties. It points out the law there is more expansive and requires more data collection than the U.K.’s Online Safety Act. It is even, according to a note on JD Supra, more broad than the Texas legislation on which some of its language was based.

But, as Perez writes, surely the whole point of decentralized networks is their resilience to this kind of overbearing legislation. In a way, I guess they are — you can still use the AT Protocol, which underpins Bluesky, in Mississippi through other personal data servers. The same is true for ActivityPub and Mastodon instances, though Mastodon says it has no way to comply with the Mississippi law. That makes me wonder if individual Mastodon instances must each incorporate age validation. I do not see anything in the sloppy text of the law saying it applies only to services over a certain number of users. It seems to non-lawyer me this means any instance — or any Bluesky PDS — allowing interaction in Mississippi could be liable for penalties.

Alexander Gromnitsky:

At the time of writing, the most recent Adobe Reader 25.x.y.z 64-bit installer for Windows 11 weights 687,230,424 bytes. After installation, the program includes ‘AI’ (of course), an auto-updater, sprinkled ads for Acrobat online services everywhere, and 2 GUIs: ‘new’ and ‘old’.

For comparison, the size of SumatraPDF-3.5.2 installer is 8,246,744 bytes. It has no ‘AI’, no auto-updater (though it can check for new versions, which I find unnecessary, for anyone sane would install it via scoop anyway), and no ads for ‘cloud storage’.

The installed size of the latest version of Acrobat is, on my Mac, 2.18 GB — or, to spell it out as Gromnitsky did, 2,176,053,007 bytes. Of course, over 435 MB of that is because it includes a copy of the Chromium web browser engine. I primarily use this application to view, edit, and add form fields to text-based documents, and to dismiss ads for A.I. features and Adobe services. Gromnitsky is describing only Reader, which is far more limited than Acrobat, even more so than Apple’s own Preview software; you cannot even split a PDF into multiple files with Reader.

If I ever give the impression of being personally attacked when I find a Preview feature no longer works as well as it once did, this is why. Acrobat and Reader are perfect examples of software made without respect for users.

(Via Michael Tsai.)

Rajpreet Sahota

U.S. Customs and Border Protection (CBP) has released new data showing a sharp rise in electronic device searches at border crossings.

From April to June alone, CBP conducted 14,899 electronic device searches, up more than 21 per cent from the previous quarter (23 per cent over the same period last year). Most of those were basic searches, but 1,075 were “advanced,” allowing officers to copy and analyze device contents.

U.S. border agents have conducted tens of thousands of searches every year for many years, along a generally increasing trajectory, so this is not necessarily specific to this administration. Unfortunately, as the Electronic Frontier Foundation reminds us, people have few rights at ports of entry, regardless of whether they are a U.S. citizen.

There are no great ways to avoid a civil rights violation, either. As a security expert told the CBC, people with burner devices would be subject to scrutiny because it is obviously not their main device. It stands to reason that someone travelling without any electronic devices at all would also be seen as more suspicious. Encryption is your best bet, but then you may need to have a whole conversation about why all of your devices are encrypted.

The EFF has a pocket guide with your best options.