Content Storage FAQ¶
How Pulp determines uniqueness when storing units?
Uniqueness of each content type is determined by it’s unit key. A unit key is a combination of content attributes. The following combinations of attributes represents the unit key for some of the content type supported by Pulp.
DRPM:
epoch
version
release
filename
checksumtype
checksum
RPM, SRPM:
name
epoch
version
release
arch
checksumtype
checksum
ISO:
name
checksum
size
Python package:
filename
Puppet module:
author
name
version
OSTree branch:
remote_id
branch
commit
Docker Blob, Manifest, ManifestList:
digest
Docker Image:
image_id
Docker Tag:
name
repo_id
schema_version
manifest_type
Debian Package:
name
version
architecture
checksumtype
checksum
What happens when a source repository changes the checksum type that is published in the repository?
Since the checksum is one of the attributes used to determine uniqueness, Pulp assumes that a package published with a sha256 checksum is different from a package published with a sha512 checksum. As a result, if a source repository switches the type of checksum it publishes, Pulp will treat all the packages in that repository as new. This can result in duplicate content being stored in Pulp.
How Pulp keeps track of units that belong to a particular repository?
Each repository is stored as a document in the repos
MongoDB collection.
Each content type is stored in a collection with a prefix of units_
.
Relationships between content and repositories are stored in the
repo_content_units
collection.
How symlinks are generated during a repository publish?
Pulp deduplicate content when possible. As a result, all content units are
stored in one place. Published content is actually a symlink to a content unit
stored elsewhere on disk. When publishing a repository, Pulp uses the
relationships stored in the repo_content_units
for a particular repository
to determine which symlinks need to be published.