Month: March 2024

Jason Koebler and Samantha Cole, 404 Media:

Almost every platform has some sort of post “firehose,” API, or way of accessing huge amounts of user posts. Famously, Twitter and Reddit used to give these away for free. Now they do not, and charging access for these posts has become big business for those companies. This is just to say that the existence of Automattic’s firehose is not anomalous in an internet ecosystem that trades on data. But this firehose also means that the average user doesn’t and can’t know what companies are getting direct access to their posts, and what they’re being used for.

I am not particularly surprised to learn that public posts on WordPress.com blogs are part of a massive feed, but I am shocked it is not as obvious that self-hosted WordPress sites with Jetpack installed are automatically opted into it as well. For something as popular as Jetpack is — over five million users, according to its WordPress.org installation page — I was surprised by how infrequently this has been mentioned: aside from privacy policies and official documentation, I found a 2013 article on the Next Web, a Reddit comment from a few years ago, and a handful of content marketing specialists suggesting it helps with search optimization.

After avoiding questions from 404, Automattic says it is “winding down” firehose access.

Samantha Cole, 404 Media:

Tumblr and WordPress.com are preparing to sell user data to Midjourney and OpenAI, according to a source with internal knowledge about the deals and internal documentation referring to the deals.

The exact types of data from each platform going to each company are not spelled out in documentation we’ve reviewed, but internal communications reviewed by 404 Media make clear that deals between Automattic, the platforms’ parent company, and OpenAI and Midjourney are imminent.

Automattic:

  • We currently block, by default, major AI platform crawlers — including ones from the biggest tech companies — and update our lists as new ones launch.

[…]

We are also working directly with select AI companies as long as their plans align with what our community cares about: attribution, opt-outs, and control.

  • We will only share public content that’s hosted on WordPress.com and Tumblr, and only from sites that haven’t opted out.

  • We are not including content from sites hosted elsewhere even if they use Automattic plugins like Jetpack or WooCommerce.

I am not sure which crawlers are currently being blocked or how that is being accomplished, but it does not appear to be in WordPress blogs’ robots.txt files.

The New York Times comprehensively blocks known machine learning crawlers, which you can verify by viewing its robots.txt file; the crawlers we are interested in are listed near the bottom, just above all the sitemaps. That is also true for Tumblr. But when I checked a bunch of WordPress.com sites at random — by searching “site:wordpress.com inurl:2024” — I found much shorter automatically generated robots.txt files, similar to WordPress’ own. I am not sure why I could not find a single WordPress.com blog with the same opt-out signal.

What is implied in Automattic’s disclosure is how it is preparing to switch Tumblr and WordPress blogs from the current opt-in model to an opt-out one. Both platforms have been popular among artists and I am not sure they would expect their contributions to become fodder for machines.

Then again, that is true for everybody who has ever posted anything on the web: it is all training data now, unless you can explicitly say otherwise.

Two weeks ago, Apple confirmed it would roll back the capabilities of Progressive Web Apps in the E.U. to the days of iPhone home screen bookmarks. It said it would need to do this to comply with the Digital Markets Act, implying it interpreted alternative browser requirements to apply equally to web apps:

[…] Addressing the complex security and privacy concerns associated with web apps using alternative browser engines would require building an entirely new integration architecture that does not currently exist in iOS and was not practical to undertake given the other demands of the DMA and the very low user adoption of Home Screen web apps. And so, to comply with the DMA’s requirements, we had to remove the Home Screen web apps feature in the EU.

However, in an update to that page, Apple is now undoing this regression, as noted by Chance Miller, 9to5Mac:

With today’s announcement, Apple has reversed course and said that Home Screen web apps will continue to exist as they did pre-iOS 17.4 in the European Union. “This support means Home Screen web apps continue to be built directly on WebKit and its security architecture, and align with the security and privacy model for native apps on iOS,” Apple explains today.

This means that all Home Screen web apps will still be powered by WebKit, regardless of whether the web app is added using Safari or not – exactly as it works today and has for years.

Apple is framing this as a decision it made because it is just so dang nice — “[w]e have received requests to continue to offer support for Home Screen web apps in iOS, therefore we will continue to offer the existing Home Screen web apps capability”. If this is true, that means its earlier statement must have been wrong — there was no legal rationale for web app regressions, only a preference.

A more likely explanation is that the DMA is complicated and Apple is still figuring out what changes it mandates in iOS. This is a big package of legislation that needs interpretation. Apple’s lawyers now seem to think PWAs can still be WebKit-only. Whether regulators will agree is something we will find out when iOS 17.4 is released and, at the same time, whether Apple was correct to blame the law.

Update: Michael Acton, Financial Times:

The European Commission welcomed Apple’s announcement, saying that it had “directly or indirectly” received more than 500 complaints about the company’s original plan.

“Contrary to Apple’s public representation, the removal of Home Screen Web Apps on iOS in the EU was neither required, nor justified, under the Digital Markets Act,” a commission spokesperson added.

A version of this entire debacle which is fair to Apple is that it misunderstood its obligations, and would never have degraded PWAs in the E.U. if not for its too-careful interpretation of the law. But it does not get to take credit for undoing its mistake.