Content Storage FAQ =================== **How Pulp determines uniqueness when storing units?** Uniqueness of each content type is determined by it's unit key. A unit key is a combination of content attributes. The following combinations of attributes represents the unit key for some of the content type supported by Pulp. **DRPM**: - epoch - version - release - filename - checksumtype - checksum **RPM, SRPM**: - name - epoch - version - release - arch - checksumtype - checksum **ISO**: - name - checksum - size **Python package**: - filename **Puppet module**: - author - name - version **OSTree branch**: - remote_id - branch - commit **Docker Blob, Manifest, ManifestList**: - digest **Docker Image**: - image_id **Docker Tag**: - name - repo_id - schema_version - manifest_type **Debian Package**: - name - version - architecture - checksumtype - checksum **What happens when a source repository changes the checksum type that is published in the repository?** Since the checksum is one of the attributes used to determine uniqueness, Pulp assumes that a package published with a sha256 checksum is different from a package published with a sha512 checksum. As a result, if a source repository switches the type of checksum it publishes, Pulp will treat all the packages in that repository as new. This can result in duplicate content being stored in Pulp. **How Pulp keeps track of units that belong to a particular repository?** Each repository is stored as a document in the ``repos`` MongoDB collection. Each content type is stored in a collection with a prefix of ``units_``. Relationships between content and repositories are stored in the ``repo_content_units`` collection. **How symlinks are generated during a repository publish?** Pulp deduplicate content when possible. As a result, all content units are stored in one place. Published content is actually a symlink to a content unit stored elsewhere on disk. When publishing a repository, Pulp uses the relationships stored in the ``repo_content_units`` for a particular repository to determine which symlinks need to be published.