ourdream’s content moderation is built as defense in depth: multiple independent layers stacked together, each tuned to catch a different category of content. Each layer specialises in a different class of signal, and the layers reinforce each other.Documentation Index
Fetch the complete documentation index at: https://safety.ourdream.ai/llms.txt
Use this file to discover all available pages before exploring further.
The layers
1. Input
Every prompt is classified before any model runs against it. Chat messages, image prompts, and character submissions pass through a set of classifiers covering content, behaviour, and metadata signals. Inputs that match the strictest signals are blocked before the model runs at all; other flagged inputs are routed to the next layer.2. Output
Generated images go through a separate set of classifiers that looks at the output independent of the prompt. This is an independent check on what the model actually produces, covering the categories that map to Prohibited content. Outputs that fail are blocked from delivery and may trigger account level review.3. Metadata and behaviour
The third layer reads patterns rather than content. It looks at signals across many actions by the same user, or across many users producing similar content. Examples include accounts that produce content the other layers reject at unusually high rates, and prompts that share structural features with previously-blocked ones. Behaviour signals usually do not block content on their own. They raise the priority of human review.4. Human reviewers
Trained reviewers see what the classifiers flag. Reviewers can:- Approve content (release it from review).
- Reject it with a reason that maps to a specific policy.
- Escalate ambiguous cases to a senior reviewer or to the trust team.
- Flag patterns that the classifier layers should learn from; edge cases feed back into the rules and training data.
5. Community reports
The last layer is everyone using the product. Users can report characters, images, videos, and scenarios. Reports trigger re-review against the current policy, which may be stricter than the one in force when the content was first generated. The most severe categories (underage content) are reviewed first.Why this shape
Each layer specialises. Classifiers handle volume at speed. Metadata signals catch behavioural patterns. Human reviewers handle ambiguity and judgement. The community surfaces what’s actually playing out in practice. Together they cover more ground than any single layer could alone.The moderation team
Reviewers are vetted ourdream staff and trusted long-term community contributors who have been onboarded into the moderation rota. They operate under written guidelines and undergo regular calibration with each other and with the trust team to keep judgement consistent. Moderation decisions are made independent of growth and revenue metrics. The trust team owns the policy; product owns the product. We invest in moderator wellbeing because the work is hard. The team also meets regularly to review edge cases and update the guidelines.What gets rejected most often
In the public-submission queue, in rough order of frequency:- Profile images with underage cues.
- Characters too close to existing IP.
- Scenarios that frame non-consent as desirable (with the AI character on the no-agency side).
- Names or descriptions matching the real-person blocklist.