- cross-posted to:
- [email protected]
- cross-posted to:
- [email protected]
PieFed uses PDQ hashing to generate a fingerprint of an image and can use that fingerprint to detect other posts that use the same or fairly similar images, for moderation purposes. Hashes are added to a block list which stops the image from being re-posted in future. Demo
PieFed does not generate PDQ hashes itself - it uses a separate service to do it. Several different instances could be using the same hashing service which will be more efficient than everyone running their own. When an image is being federated around the URL of it will be sent to the hashing service by multiple different fedi instances and only the first will be slow as all the subsequent requests will be served from a cache.
Get the code from https://github.com/rimu/pdqhash-python
By doing a GET request for https://yourdomain.tld/pdq-hash?image_url=url_to_image_to_hash you will receive JSON like this:
{
“pdq_hash_binary”: “100100100011…”,
“quality”: 100
}
The quality score (0–100) indicates how well the image content supports a reliable perceptual hash.
Higher scores mean better contrast, edges, and texture in the image. PieFed accepts anything > 70.
This sounds very interesting. What is the purpose of the hash?
Dansup talked about actually doing a centralized CDN to reduce storage, storage an image in one place so any instance can use it instead of everyone hosting their own. You could still host your own as well though.
This seems seems not that. This is just the hash right, so not certain what hashes do in the Fedi.
Thank you though!
Rather than de-duplication it’s more about blocking CSAM / spam and when a large flood of bad images have already arrived finding all the copies of them that there are (even if those copies are slightly different from each other). Demo of it at https://piefed.social/post/751901 .
It looks like we’ll need a less fuzzy hash for de-duplication.
That’s AMAZING! Thank you so much sir! :-)