Proof Benchmark - v1.0

Evaluator Summary

This benchmark was performed on a curated dataset designed to challenge the Proof system. Most images were outdoor nature scenes that were randomly transformed to simulate AI edits, Photoshop edits, and photos of printed photos. Additional real-world examples like screenshots and pictures of screens were included to widen the variety.

These transformations were intentionally created to be difficult edge cases, since the system already performs extremely well on typical real-world edits.

The full dataset used in this evaluation is available here:
Google Drive – Benchmark Dataset

Conclusion:
Proof correctly identifies the majority of altered and non-original images with high reliability, including many of the toughest test cases. Overall, the system performs at a strong level and delivers on its intended purpose.

Dallin Munger, MS
Independent Security Researcher
Upwork Profile • LinkedIn Profile

Open Full Benchmark Report