Jump to content

User:Alaexis/Diffusor

From Wikimedia Commons, the free media repository
Diffusor showing a category suggestion

This tool helps with category diffusion: moving media from a higher-level category to the most specific category(ies) where it belongs. It suggests categories and facilitates the edits.

IMPORTANT: its suggestions are just that, don't accept them blindly! The user makes the final edit and remains fully responsible for it. About one third of recommendations are incorrect, depending on the category. Try it first with the categories you're familiar with to get a sense of how it works.

Installation

[edit]

Add the following to your common.js on Commons (if this is your first script you will need to create this common.js page first.):

importScript('User:Alaexis/Diffusor.js');

Then just visit any category page and in the top right click "Tools"→"Diffusor"→"Analyse".

Observations from testing on Category:Gagra

[edit]
No info in the description

The tool (code here) helped me categorise dozens of images much faster than I would've done it manually. The majority of suggestions it made were correct, though about one third were not. See various failure reasons below.

Domain familiarity helped a lot. For someone unfamiliar with the topic, it would be difficult to judge whether a suggestion is correct, even though I've added some hints like existing categories.

  1. Incomplete data or fuzzy categories
    1. No description. This image has no meaningful description (just "gagra abkhazia"). The LLM has almost nothing to work with. Proper categorisation of this lovely church would require image recognition or using the geolocation embedded in the file metadata. The tool doesn't do it.
    2. Similar categories with fuzzy definitions. Categories like "Nature in Gagra" and "Views of Gagra" have overlapping scopes. It's hard for the LLM (and often for humans) to decide which one is more appropriate. This leads to inconsistent suggestions at times.
  2. LLM limitations
    1. Failures due to name similarity.
      The second photo depicts Gagripsh railway station and should be in Category:Transport in Gagra (there is no category for this station). Instead, the LLM suggested Gagrypsh Restaurant. The LLM used by the tool is one of the open-source ones hosted by PublicAI and is not a frontier model. Hopefully we'll get access to more powerful models in future.
    2. The tool might not work perfectly for huge categories with thousands of descendants.
  3. Errors in Commons data
    1. File doesn't belong to any child category (wrongly categorised upstream).
      Stalin's pool table
      The pool table is in the Gagra district, not in Gagra city. The tool can't fix upstream miscategorisation — this is out of scope for diffusion.
    2. There are errors in the existing categorisation sometimes. One of the categories I tested had two sub-categories pointing to the same entity. Proper categorisation would require deleting the duplicate. The tool currently only adds categories and removes the parent. This happens rarely and can be handled manually.

Feel free to let me know your impressions, both positive and less so. Alaexis (talk) 20:23, 9 February 2026 (UTC)

GitHub repo

[edit]