This tool helps with category diffusion: moving media from a higher-level category to the most specific category(ies) where it belongs. It suggests categories and facilitates the edits.
IMPORTANT: its suggestions are just that, don't accept them blindly! The user makes the final edit and remains fully responsible for it. About one third of recommendations are incorrect, depending on the category. Try it first with the categories you're familiar with to get a sense of how it works.
The tool (code here) helped me categorise dozens of images much faster than I would've done it manually. The majority of suggestions it made were correct, though about one third were not. See various failure reasons below.
Domain familiarity helped a lot. For someone unfamiliar with the topic, it would be difficult to judge whether a suggestion is correct, even though I've added some hints like existing categories.
Incomplete data or fuzzy categories
No description. This image has no meaningful description (just "gagra abkhazia"). The LLM has almost nothing to work with. Proper categorisation of this lovely church would require image recognition or using the geolocation embedded in the file metadata. The tool doesn't do it.
Similar categories with fuzzy definitions. Categories like "Nature in Gagra" and "Views of Gagra" have overlapping scopes. It's hard for the LLM (and often for humans) to decide which one is more appropriate. This leads to inconsistent suggestions at times.
LLM limitations
Failures due to name similarity. The second photo depicts Gagripsh railway station and should be in Category:Transport in Gagra (there is no category for this station). Instead, the LLM suggested Gagrypsh Restaurant. The LLM used by the tool is one of the open-source ones hosted by PublicAI and is not a frontier model. Hopefully we'll get access to more powerful models in future.
The tool might not work perfectly for huge categories with thousands of descendants.
Errors in Commons data
File doesn't belong to any child category (wrongly categorised upstream). Stalin's pool table The pool table is in the Gagra district, not in Gagra city. The tool can't fix upstream miscategorisation — this is out of scope for diffusion.
There are errors in the existing categorisation sometimes. One of the categories I tested had two sub-categories pointing to the same entity. Proper categorisation would require deleting the duplicate. The tool currently only adds categories and removes the parent. This happens rarely and can be handled manually.
Feel free to let me know your impressions, both positive and less so. Alaexis (talk) 20:23, 9 February 2026 (UTC)