Overview: Kenneth Wehr, the Greenlandic Wikipedia, and AI-generated content
Kenneth Wehr is a volunteer who helped manage the Greenlandic Wikipedia. In recent years he and other community members found themselves deleting large numbers of low quality pages. The content had been produced automatically or with AI assistance, and the small pool of native Greenlandic speakers on the project could not keep up.
This is not only a Greenland story. Small language Wikipedias across the world face similar pressure. Automated article creation, AI-assisted writing tools, and apps that pull data from central databases are flooding these projects. Few native speakers can moderate the surge. The result is a cycle that damages quality and reduces incentives for local communities to contribute.
What happened on Greenlandic Wikipedia
Volunteers discovered thousands of short articles that looked factual on the surface but were often shallow, wrong, or misleading. Many of those pages were generated by bots or AI systems that use structured data sources such as Wikidata. Faced with an unsustainable volume of low value pages, stewards like Wehr opted to delete much of the mass-created content to preserve the encyclopedia’s credibility.
Those deletions were a hard decision. Deleting content can erase traces of language use, and it can alienate contributors who did not understand the problem. But leaving poor material online risks making the site less useful for native speakers, learners, and researchers.
How AI and automation flood small-language Wikipedias
Several technical and social mechanisms combine to create the flood of automated content.
- Bot-driven article creation. Scripts can turn structured database entries into short encyclopedia pages quickly. For large languages that speeds content growth, but for small languages it can overwhelm human reviewers.
- AI-assisted writing. Language models can draft text from templates, producing many superficially coherent entries that lack depth or local context.
- Third-party apps and services. Some apps pull content from Wikidata or small Wikipedias to feed dictionaries, travel guides, or AI inputs. Those apps may automatically create or resurface content with limited quality checks.
- Limited moderation capacity. Small-language projects often have only a handful of native speakers with editing skills; they cannot keep up with high-volume automated additions.
Why this matters to ordinary readers
Content in a native language shapes who can find information and who can participate online. When encyclopedias in small languages are filled with low quality pages, consequences include:
- Less reliable knowledge for native speakers and learners.
- Lower motivation for community members to contribute authentic, culturally informed content.
- Potential amplification of errors by search engines, chatbots, and apps that reuse that content.
Those effects matter to everyday life. People use online reference material for learning, teaching, public policy, and cultural projects. If the available material is poor, those uses suffer.
Feedback loops that accelerate decline
Small-language Wikipedias are vulnerable to a downward spiral, where one problem amplifies others.
- Poor content discourages readers and contributors. Fewer active editors mean slower correction and less creation of high quality material.
- When AI models or apps train on or scrape those pages, they can reproduce mistakes and pass them to other systems.
- The presence of many automated or low quality articles may make the project look inactive to potential contributors, reducing recruitment.
- Platform algorithms that favor fresh or numerous pages can surface low quality content, increasing its reach and the incentive to mass-create more.
Together these feedbacks can reduce both the visibility and the trustworthiness of a language on the open web.
Cultural and linguistic consequences
Language communities rely on digital content for preservation and transmission. The risks include:
- Loss of accurate digital documentation about local people, places, and practices.
- Visibility problems, where search results show generic, low quality pages instead of rich local sources.
- Misrepresentation, when automated text applies inappropriate templates or translations that misstate cultural facts.
For languages with few speakers, losing a reliable online record can make it harder to teach the language to new generations and to support research or cultural projects.
What platforms and AI developers can do
The problem sits at the intersection of open knowledge platforms, AI systems, and app ecosystems. Several interventions can reduce harm while keeping projects open.
- Quality detection and labeling. Platforms could add automated quality flags, so readers and apps can tell whether an article was generated or lacks local review.
- Creation quotas and throttles. Rate limits on automated article creation can give communities time to review new content.
- Funding and grants. Targeted support for small-language communities helps hire moderators and build local capacity.
- Model training practices. AI developers can use curated, annotated corpora for low-resource languages, and avoid training on clearly poor material.
- App store and third-party rules. Stores and platform marketplaces can require disclosure when apps use mass-created content, or when they publish generated text for small languages.
These steps distribute responsibility across stakeholders, while keeping space for helpful automation that assists human editors.
Practical strategies to protect and restore small-language Wikipedias
Communities, funders, and engineers can take concrete measures to improve outcomes.
- Community funding and training. Offer paid fellowships and training for native speakers to learn moderation and editing skills.
- Curated corpora for AI. Create verified text collections that AI models can use to produce higher quality, culturally accurate output.
- Tooling that supports editors. Build interfaces that surface likely AI-generated pages, suggest corrections, and enable batch review.
- AI as assistant, not replacement. Use models to draft suggestions for local editors, who then review and publish final text.
- Documentation and guidelines. Provide clear norms about when to accept or reject bot-created content, and how to document automated contributions.
Checklist for local editors
- Identify recent mass-created pages, and mark them with clear tags for review.
- Prioritize pages that affect cultural or factual accuracy, such as biographies, place names, and language resources.
- Create short training modules for volunteers on spotting AI artifacts and improving local phrasing.
- Request microgrants to fund paid review sessions and language expertise.
- Keep archives of deleted pages, with reasons and metadata, so content is not lost without record.
Tradeoffs and limits
There is no single fix. Limiting automated creation can slow coverage growth, which some projects welcome. At the same time, unrestricted automation can erode trust. Each community will need to weigh openness against protection, and to choose rules that match local capacity and goals.
Technical detection is imperfect. False positives and negatives will occur, which is why human oversight and clear governance rules are important. Funding decisions involve tradeoffs too, since resources are finite and must be prioritized where they will do the most good.
Key takeaways
- Small-language Wikipedias such as Greenlandic have been overwhelmed by mass-created, AI-assisted content; volunteers like Kenneth Wehr have deleted many low quality pages to protect credibility.
- The structural vulnerability arises from few native speakers, low moderation capacity, and automated tools that can create volume faster than humans can review.
- Feedback loops can reduce participation and spread errors into other systems that scrape or train on the content.
- Solutions include detection and labeling, funding for local moderation, curated training corpora for AI, quotas on automated creation, and tools that make AI an assistant not a replacement.
FAQ
Is the language lost when pages are deleted? Not necessarily. Deletions can remove low quality text, but careful archiving and transparent reasons help preserve documentation. The goal is to protect the quality of the public record.
Can AI help preserve endangered languages? Yes. AI can assist with transcription, teaching tools, and draft generation, if models are trained on curated, community-approved data and local speakers retain editorial control.
How can ordinary readers help? Support nonprofit groups that fund language work, contribute time to small-language projects if you have skills, and treat low quality content in minority languages with skepticism until reviewed by local experts.
Conclusion
The case of Greenlandic Wikipedia, and the work of volunteers like Kenneth Wehr, shows a broader problem at the intersection of AI and cultural preservation. Automation can expand access to information quickly, but without local oversight it can also degrade trust and visibility for vulnerable languages.
Protecting linguistic diversity online will take coordinated action from Wikimedia projects, AI developers, app platforms, funders, and local communities. Practical steps exist, including targeted funding, better tooling, curated corpora, and policies that limit mass creation until human review is possible. Those measures can keep AI useful while ensuring that small languages are represented accurately and respectfully on the open web.







Leave a comment