How moderation works - ourdream Trust & Safety

ourdream’s content moderation is built as defense in depth: multiple independent layers stacked together, each tuned to catch a different category of content. Each layer specialises in a different class of signal, and the layers reinforce each other.

Defense in depth: five moderation layers stacked. Input, output, metadata and behaviour, human reviewers, community reports.

The layers

1. Input

Every prompt is classified before any model runs against it. Chat messages, image prompts, and character submissions pass through a set of classifiers covering content, behaviour, and metadata signals. Inputs that match the strictest signals are blocked before the model runs at all; other flagged inputs are routed to the next layer.

2. Output

Generated images go through a separate set of classifiers that looks at the output independent of the prompt. This is an independent check on what the model actually produces, covering the categories that map to Prohibited content. Outputs that fail are blocked from delivery and may trigger account level review.

3. Metadata and behaviour

The third layer reads patterns rather than content. It looks at signals across many actions by the same user, or across many users producing similar content. Examples include accounts that produce content the other layers reject at unusually high rates, and prompts that share structural features with previously-blocked ones. Behaviour signals usually do not block content on their own. They raise the priority of human review.

4. Human reviewers

Trained reviewers see what the classifiers flag. Reviewers can:

Approve content (release it from review).
Reject it with a reason that maps to a specific policy.
Escalate ambiguous cases to a senior reviewer or to the trust team.
Flag patterns that the classifier layers should learn from; edge cases feed back into the rules and training data.

For public character submissions the review queue runs in the hundreds per day. For chat-generated content the volume is much higher; the classifier layers handle the bulk of it, and the cases that warrant a human reach a human.

5. Community reports

The last layer is everyone using the product. Users can report characters, images, videos, and scenarios. Reports trigger re-review against the current policy, which may be stricter than the one in force when the content was first generated. The most severe categories (underage content) are reviewed first.

Why this shape

Each layer specialises. Classifiers handle volume at speed. Metadata signals catch behavioural patterns. Human reviewers handle ambiguity and judgement. The community surfaces what’s actually playing out in practice. Together they cover more ground than any single layer could alone.

The moderation team

Reviewers are vetted ourdream staff and trusted long-term community contributors who have been onboarded into the moderation rota. They operate under written guidelines and undergo regular calibration with each other and with the trust team to keep judgement consistent. Moderation decisions are made independent of growth and revenue metrics. The trust team owns the policy; product owns the product. We care about moderator wellbeing because the work is hard, and do not tolerate abuse or harassment of our moderation team. The team also meets regularly to review edge cases and update the guidelines.

What gets rejected most often

In the public-submission queue, in rough order of frequency:

Profile images with underage cues.
Characters too close to existing IP.
Scenarios that frame non-consent as desirable — whether the user is cast as perpetrator or the framing otherwise presents non-consent as the appeal.
Names or descriptions matching the real-person blocklist.

Each of these is described on Prohibited content.

What classifiers see vs what humans see

Automated classifiers process all content on the platform — including private chat messages and private generations — because they have to in order to enforce the policies above. Classifier processing happens in-line and no human reads the content as part of it. Human reviewers only see content that has been (a) flagged by a classifier, (b) reported by a user, or (c) escalated as part of a trust-team investigation. They do not browse private chats. Account data (email, payment information, IP address) is gated behind trust- team access and is touched only when an investigation requires it.

Appeals

If a creator believes a rejection or removal was applied incorrectly, the Appeals process is the way to raise it. A different reviewer handles the appeal, with the original objection in mind. Appeals also feed back into the rules and training data, so the work creators put into making the case rarely goes nowhere.

​The layers

​1. Input

​2. Output

​3. Metadata and behaviour

​4. Human reviewers

​5. Community reports

​Why this shape

​The moderation team

​What gets rejected most often

​What classifiers see vs what humans see

​Appeals