1 //! A `Source` for registry-based packages.
3 //! # What's a Registry?
5 //! Registries are central locations where packages can be uploaded to,
6 //! discovered, and searched for. The purpose of a registry is to have a
7 //! location that serves as permanent storage for versions of a crate over time.
9 //! Compared to git sources, a registry provides many packages as well as many
10 //! versions simultaneously. Git sources can also have commits deleted through
11 //! rebasings where registries cannot have their versions deleted.
13 //! # The Index of a Registry
15 //! One of the major difficulties with a registry is that hosting so many
16 //! packages may quickly run into performance problems when dealing with
17 //! dependency graphs. It's infeasible for cargo to download the entire contents
18 //! of the registry just to resolve one package's dependencies, for example. As
19 //! a result, cargo needs some efficient method of querying what packages are
20 //! available on a registry, what versions are available, and what the
21 //! dependencies for each version is.
23 //! One method of doing so would be having the registry expose an HTTP endpoint
24 //! which can be queried with a list of packages and a response of their
25 //! dependencies and versions is returned. This is somewhat inefficient however
26 //! as we may have to hit the endpoint many times and we may have already
27 //! queried for much of the data locally already (for other packages, for
28 //! example). This also involves inventing a transport format between the
29 //! registry and Cargo itself, so this route was not taken.
31 //! Instead, Cargo communicates with registries through a git repository
32 //! referred to as the Index. The Index of a registry is essentially an easily
33 //! query-able version of the registry's database for a list of versions of a
34 //! package as well as a list of dependencies for each version.
36 //! Using git to host this index provides a number of benefits:
38 //! * The entire index can be stored efficiently locally on disk. This means
39 //! that all queries of a registry can happen locally and don't need to touch
42 //! * Updates of the index are quite efficient. Using git buys incremental
43 //! updates, compressed transmission, etc for free. The index must be updated
44 //! each time we need fresh information from a registry, but this is one
45 //! update of a git repository that probably hasn't changed a whole lot so
46 //! it shouldn't be too expensive.
48 //! Additionally, each modification to the index is just appending a line at
49 //! the end of a file (the exact format is described later). This means that
50 //! the commits for an index are quite small and easily applied/compressible.
52 //! ## The format of the Index
54 //! The index is a store for the list of versions for all packages known, so its
55 //! format on disk is optimized slightly to ensure that `ls registry` doesn't
56 //! produce a list of all packages ever known. The index also wants to ensure
57 //! that there's not a million files which may actually end up hitting
58 //! filesystem limits at some point. To this end, a few decisions were made
59 //! about the format of the registry:
61 //! 1. Each crate will have one file corresponding to it. Each version for a
62 //! crate will just be a line in this file.
63 //! 2. There will be two tiers of directories for crate names, under which
64 //! crates corresponding to those tiers will be located.
66 //! As an example, this is an example hierarchy of an index:
87 //! The root of the index contains a `config.json` file with a few entries
88 //! corresponding to the registry (see `RegistryConfig` below).
90 //! Otherwise, there are three numbered directories (1, 2, 3) for crates with
91 //! names 1, 2, and 3 characters in length. The 1/2 directories simply have the
92 //! crate files underneath them, while the 3 directory is sharded by the first
93 //! letter of the crate name.
95 //! Otherwise the top-level directory contains many two-letter directory names,
96 //! each of which has many sub-folders with two letters. At the end of all these
97 //! are the actual crate files themselves.
99 //! The purpose of this layout is to hopefully cut down on `ls` sizes as well as
100 //! efficient lookup based on the crate name itself.
104 //! Each file in the index is the history of one crate over time. Each line in
105 //! the file corresponds to one version of a crate, stored in JSON format (see
106 //! the `RegistryPackage` structure below).
108 //! As new versions are published, new lines are appended to this file. The only
109 //! modifications to this file that should happen over time are yanks of a
110 //! particular version.
112 //! # Downloading Packages
114 //! The purpose of the Index was to provide an efficient method to resolve the
115 //! dependency graph for a package. So far we only required one network
116 //! interaction to update the registry's repository (yay!). After resolution has
117 //! been performed, however we need to download the contents of packages so we
118 //! can read the full manifest and build the source code.
120 //! To accomplish this, this source's `download` method will make an HTTP
121 //! request per-package requested to download tarballs into a local cache. These
122 //! tarballs will then be unpacked into a destination folder.
124 //! Note that because versions uploaded to the registry are frozen forever that
125 //! the HTTP download and unpacking can all be skipped if the version has
126 //! already been downloaded and unpacked. This caching allows us to only
127 //! download a package when absolutely necessary.
129 //! # Filesystem Hierarchy
131 //! Overall, the `$HOME/.cargo` looks like this when talking about the registry:
134 //! # A folder under which all registry metadata is hosted (similar to
135 //! # $HOME/.cargo/git)
136 //! $HOME/.cargo/registry/
138 //! # For each registry that cargo knows about (keyed by hostname + hash)
139 //! # there is a folder which is the checked out version of the index for
140 //! # the registry in this location. Note that this is done so cargo can
141 //! # support multiple registries simultaneously
143 //! registry1-<hash>/
144 //! registry2-<hash>/
147 //! # This folder is a cache for all downloaded tarballs from a registry.
148 //! # Once downloaded and verified, a tarball never changes.
150 //! registry1-<hash>/<pkg>-<version>.crate
153 //! # Location in which all tarballs are unpacked. Each tarball is known to
154 //! # be frozen after downloading, so transitively this folder is also
155 //! # frozen once its unpacked (it's never unpacked again)
157 //! registry1-<hash>/<pkg>-<version>/...
161 use std
::borrow
::Cow
;
162 use std
::collections
::BTreeMap
;
163 use std
::collections
::HashSet
;
164 use std
::fs
::{File, OpenOptions}
;
166 use std
::path
::{Path, PathBuf}
;
168 use flate2
::read
::GzDecoder
;
170 use semver
::{Version, VersionReq}
;
171 use serde
::Deserialize
;
174 use crate::core
::dependency
::{DepKind, Dependency}
;
175 use crate::core
::source
::MaybePackage
;
176 use crate::core
::{InternedString, Package, PackageId, Source, SourceId, Summary}
;
177 use crate::sources
::PathSource
;
178 use crate::util
::errors
::CargoResultExt
;
179 use crate::util
::hex
;
180 use crate::util
::into_url
::IntoUrl
;
181 use crate::util
::{restricted_names, CargoResult, Config, Filesystem}
;
183 const PACKAGE_SOURCE_LOCK
: &str = ".cargo-ok";
184 pub const CRATES_IO_INDEX
: &str = "https://github.com/rust-lang/crates.io-index";
185 pub const CRATES_IO_REGISTRY
: &str = "crates-io";
186 const CRATE_TEMPLATE
: &str = "{crate}";
187 const VERSION_TEMPLATE
: &str = "{version}";
189 pub struct RegistrySource
<'cfg
> {
191 src_path
: Filesystem
,
192 config
: &'cfg Config
,
194 ops
: Box
<dyn RegistryData
+ 'cfg
>,
195 index
: index
::RegistryIndex
<'cfg
>,
196 yanked_whitelist
: HashSet
<PackageId
>,
199 #[derive(Deserialize)]
200 pub struct RegistryConfig
{
201 /// Download endpoint for all crates.
203 /// The string is a template which will generate the download URL for the
204 /// tarball of a specific version of a crate. The substrings `{crate}` and
205 /// `{version}` will be replaced with the crate's name and version
208 /// For backwards compatibility, if the string does not contain `{crate}` or
209 /// `{version}`, it will be extended with `/{crate}/{version}/download` to
210 /// support registries like crates.io which were created before the
211 /// templating setup was created.
214 /// API endpoint for the registry. This is what's actually hit to perform
215 /// operations like yanks, owner modifications, publish new crates, etc.
216 /// If this is None, the registry does not support API commands.
217 pub api
: Option
<String
>,
220 /// A single line in the index representing a single version of a package.
221 #[derive(Deserialize)]
222 pub struct RegistryPackage
<'a
> {
223 name
: InternedString
,
226 deps
: Vec
<RegistryDependency
<'a
>>,
227 features
: BTreeMap
<InternedString
, Vec
<InternedString
>>,
229 /// If `true`, Cargo will skip this version when resolving.
231 /// This was added in 2014. Everything in the crates.io index has this set
232 /// now, so this probably doesn't need to be an option anymore.
233 yanked
: Option
<bool
>,
234 /// Native library name this package links to.
236 /// Added early 2018 (see https://github.com/rust-lang/cargo/pull/4978),
237 /// can be `None` if published before then.
238 links
: Option
<InternedString
>,
242 fn escaped_char_in_json() {
243 let _
: RegistryPackage
<'_
> = serde_json
::from_str(
244 r
#"{"name":"a","vers":"0.0.1","deps":[],"cksum":"bae3","features":{}}"#,
247 let _
: RegistryPackage
<'_
> = serde_json
::from_str(
248 r
#"{"name":"a","vers":"0.0.1","deps":[],"cksum":"bae3","features":{"test":["k","q"]},"links":"a-sys"}"#
251 // Now we add escaped cher all the places they can go
252 // these are not valid, but it should error later than json parsing
253 let _
: RegistryPackage
<'_
> = serde_json
::from_str(
255 "name":"This name has a escaped cher in it \n\t\" ",
260 "features": [" \n\t\" "],
262 "default_features": true,
263 "target": " \n\t\" ",
265 "registry": " \n\t\" "
268 "features":{"test \n\t\" ":["k \n\t\" ","q \n\t\" "]},
269 "links":" \n\t\" "}"#,
274 #[derive(Deserialize)]
275 #[serde(field_identifier, rename_all = "lowercase")]
286 #[derive(Deserialize)]
287 struct RegistryDependency
<'a
> {
288 name
: InternedString
,
291 features
: Vec
<InternedString
>,
293 default_features
: bool
,
294 target
: Option
<Cow
<'a
, str>>,
295 kind
: Option
<Cow
<'a
, str>>,
296 registry
: Option
<Cow
<'a
, str>>,
297 package
: Option
<InternedString
>,
298 public
: Option
<bool
>,
301 impl<'a
> RegistryDependency
<'a
> {
302 /// Converts an encoded dependency in the registry to a cargo dependency
303 pub fn into_dep(self, default: SourceId
) -> CargoResult
<Dependency
> {
304 let RegistryDependency
{
317 let id
= if let Some(registry
) = ®istry
{
318 SourceId
::for_registry(®istry
.into_url()?
)?
323 let mut dep
= Dependency
::parse_no_deprecated(package
.unwrap_or(name
), Some(&req
), id
)?
;
324 if package
.is_some() {
325 dep
.set_explicit_name_in_toml(name
);
327 let kind
= match kind
.as_deref().unwrap_or("") {
328 "dev" => DepKind
::Development
,
329 "build" => DepKind
::Build
,
330 _
=> DepKind
::Normal
,
333 let platform
= match target
{
334 Some(target
) => Some(target
.parse()?
),
338 // All dependencies are private by default
339 let public
= public
.unwrap_or(false);
341 // Unfortunately older versions of cargo and/or the registry ended up
342 // publishing lots of entries where the features array contained the
343 // empty feature, "", inside. This confuses the resolution process much
344 // later on and these features aren't actually valid, so filter them all
346 features
.retain(|s
| !s
.is_empty());
348 // In index, "registry" is null if it is from the same index.
349 // In Cargo.toml, "registry" is None if it is from the default
350 if !id
.is_default_registry() {
351 dep
.set_registry_id(id
);
354 dep
.set_optional(optional
)
355 .set_default_features(default_features
)
356 .set_features(features
)
357 .set_platform(platform
)
365 pub trait RegistryData
{
366 fn prepare(&self) -> CargoResult
<()>;
367 fn index_path(&self) -> &Filesystem
;
372 data
: &mut dyn FnMut(&[u8]) -> CargoResult
<()>,
373 ) -> CargoResult
<()>;
374 fn config(&mut self) -> CargoResult
<Option
<RegistryConfig
>>;
375 fn update_index(&mut self) -> CargoResult
<()>;
376 fn download(&mut self, pkg
: PackageId
, checksum
: &str) -> CargoResult
<MaybeLock
>;
377 fn finish_download(&mut self, pkg
: PackageId
, checksum
: &str, data
: &[u8])
378 -> CargoResult
<File
>;
380 fn is_crate_downloaded(&self, _pkg
: PackageId
) -> bool
{
383 fn assert_index_locked
<'a
>(&self, path
: &'a Filesystem
) -> &'a Path
;
384 fn current_version(&self) -> Option
<InternedString
>;
389 Download { url: String, descriptor: String }
,
396 fn short_name(id
: SourceId
) -> String
{
397 let hash
= hex
::short_hash(&id
);
398 let ident
= id
.url().host_str().unwrap_or("").to_string();
399 format
!("{}-{}", ident
, hash
)
402 impl<'cfg
> RegistrySource
<'cfg
> {
405 yanked_whitelist
: &HashSet
<PackageId
>,
406 config
: &'cfg Config
,
407 ) -> RegistrySource
<'cfg
> {
408 let name
= short_name(source_id
);
409 let ops
= remote
::RemoteRegistry
::new(source_id
, config
, &name
);
410 RegistrySource
::new(source_id
, config
, &name
, Box
::new(ops
), yanked_whitelist
)
416 yanked_whitelist
: &HashSet
<PackageId
>,
417 config
: &'cfg Config
,
418 ) -> RegistrySource
<'cfg
> {
419 let name
= short_name(source_id
);
420 let ops
= local
::LocalRegistry
::new(path
, config
, &name
);
421 RegistrySource
::new(source_id
, config
, &name
, Box
::new(ops
), yanked_whitelist
)
426 config
: &'cfg Config
,
428 ops
: Box
<dyn RegistryData
+ 'cfg
>,
429 yanked_whitelist
: &HashSet
<PackageId
>,
430 ) -> RegistrySource
<'cfg
> {
432 src_path
: config
.registry_source_path().join(name
),
436 index
: index
::RegistryIndex
::new(source_id
, ops
.index_path(), config
),
437 yanked_whitelist
: yanked_whitelist
.clone(),
442 /// Decode the configuration stored within the registry.
444 /// This requires that the index has been at least checked out.
445 pub fn config(&mut self) -> CargoResult
<Option
<RegistryConfig
>> {
449 /// Unpacks a downloaded package into a location where it's ready to be
452 /// No action is taken if the source looks like it's already unpacked.
453 fn unpack_package(&self, pkg
: PackageId
, tarball
: &File
) -> CargoResult
<PathBuf
> {
454 // The `.cargo-ok` file is used to track if the source is already
456 let package_dir
= format
!("{}-{}", pkg
.name(), pkg
.version());
457 let dst
= self.src_path
.join(&package_dir
);
459 let path
= dst
.join(PACKAGE_SOURCE_LOCK
);
460 let path
= self.config
.assert_package_cache_locked(&path
);
461 let unpack_dir
= path
.parent().unwrap();
462 if let Ok(meta
) = path
.metadata() {
464 return Ok(unpack_dir
.to_path_buf());
467 let mut ok
= OpenOptions
::new()
472 .chain_err(|| format
!("failed to open `{}`", path
.display()))?
;
474 let gz
= GzDecoder
::new(tarball
);
475 let mut tar
= Archive
::new(gz
);
476 let prefix
= unpack_dir
.file_name().unwrap();
477 let parent
= unpack_dir
.parent().unwrap();
478 for entry
in tar
.entries()?
{
479 let mut entry
= entry
.chain_err(|| "failed to iterate over archive")?
;
480 let entry_path
= entry
482 .chain_err(|| "failed to read entry path")?
485 // We're going to unpack this tarball into the global source
486 // directory, but we want to make sure that it doesn't accidentally
487 // (or maliciously) overwrite source code from other crates. Cargo
488 // itself should never generate a tarball that hits this error, and
489 // crates.io should also block uploads with these sorts of tarballs,
490 // but be extra sure by adding a check here as well.
491 if !entry_path
.starts_with(prefix
) {
493 "invalid tarball downloaded, contains \
494 a file at {:?} which isn't under {:?}",
500 let mut result
= entry
.unpack_in(parent
).map_err(anyhow
::Error
::from
);
501 if cfg
!(windows
) && restricted_names
::is_windows_reserved_path(&entry_path
) {
502 result
= result
.chain_err(|| {
504 "`{}` appears to contain a reserved Windows path, \
505 it cannot be extracted on Windows",
510 result
.chain_err(|| format
!("failed to unpack entry at `{}`", entry_path
.display()))?
;
513 // Write to the lock file to indicate that unpacking was successful.
516 Ok(unpack_dir
.to_path_buf())
519 fn do_update(&mut self) -> CargoResult
<()> {
520 self.ops
.update_index()?
;
521 let path
= self.ops
.index_path();
522 self.index
= index
::RegistryIndex
::new(self.source_id
, path
, self.config
);
527 fn get_pkg(&mut self, package
: PackageId
, path
: &File
) -> CargoResult
<Package
> {
529 .unpack_package(package
, path
)
530 .chain_err(|| format
!("failed to unpack package `{}`", package
))?
;
531 let mut src
= PathSource
::new(&path
, self.source_id
, self.config
);
533 let mut pkg
= match src
.download(package
)?
{
534 MaybePackage
::Ready(pkg
) => pkg
,
535 MaybePackage
::Download { .. }
=> unreachable
!(),
538 // After we've loaded the package configure its summary's `checksum`
539 // field with the checksum we know for this `PackageId`.
540 let req
= VersionReq
::exact(package
.version());
541 let summary_with_cksum
= self
543 .summaries(package
.name(), &req
, &mut *self.ops
)?
544 .map(|s
| s
.summary
.clone())
546 .expect("summary not found");
547 if let Some(cksum
) = summary_with_cksum
.checksum() {
550 .set_checksum(cksum
.to_string());
557 impl<'cfg
> Source
for RegistrySource
<'cfg
> {
558 fn query(&mut self, dep
: &Dependency
, f
: &mut dyn FnMut(Summary
)) -> CargoResult
<()> {
559 // If this is a precise dependency, then it came from a lock file and in
560 // theory the registry is known to contain this version. If, however, we
561 // come back with no summaries, then our registry may need to be
562 // updated, so we fall back to performing a lazy update.
563 if dep
.source_id().precise().is_some() && !self.updated
{
564 debug
!("attempting query without update");
565 let mut called
= false;
567 .query_inner(dep
, &mut *self.ops
, &self.yanked_whitelist
, &mut |s
| {
576 debug
!("falling back to an update");
582 .query_inner(dep
, &mut *self.ops
, &self.yanked_whitelist
, &mut |s
| {
589 fn fuzzy_query(&mut self, dep
: &Dependency
, f
: &mut dyn FnMut(Summary
)) -> CargoResult
<()> {
591 .query_inner(dep
, &mut *self.ops
, &self.yanked_whitelist
, f
)
594 fn supports_checksums(&self) -> bool
{
598 fn requires_precise(&self) -> bool
{
602 fn source_id(&self) -> SourceId
{
606 fn update(&mut self) -> CargoResult
<()> {
607 // If we have an imprecise version then we don't know what we're going
608 // to look for, so we always attempt to perform an update here.
610 // If we have a precise version, then we'll update lazily during the
611 // querying phase. Note that precise in this case is only
612 // `Some("locked")` as other `Some` values indicate a `cargo update
613 // --precise` request
614 if self.source_id
.precise() != Some("locked") {
617 debug
!("skipping update due to locked registry");
622 fn download(&mut self, package
: PackageId
) -> CargoResult
<MaybePackage
> {
623 let hash
= self.index
.hash(package
, &mut *self.ops
)?
;
624 match self.ops
.download(package
, hash
)?
{
625 MaybeLock
::Ready(file
) => self.get_pkg(package
, &file
).map(MaybePackage
::Ready
),
626 MaybeLock
::Download { url, descriptor }
=> {
627 Ok(MaybePackage
::Download { url, descriptor }
)
632 fn finish_download(&mut self, package
: PackageId
, data
: Vec
<u8>) -> CargoResult
<Package
> {
633 let hash
= self.index
.hash(package
, &mut *self.ops
)?
;
634 let file
= self.ops
.finish_download(package
, hash
, &data
)?
;
635 self.get_pkg(package
, &file
)
638 fn fingerprint(&self, pkg
: &Package
) -> CargoResult
<String
> {
639 Ok(pkg
.package_id().version().to_string())
642 fn describe(&self) -> String
{
643 self.source_id
.display_index()
646 fn add_to_yanked_whitelist(&mut self, pkgs
: &[PackageId
]) {
647 self.yanked_whitelist
.extend(pkgs
);
650 fn is_yanked(&mut self, pkg
: PackageId
) -> CargoResult
<bool
> {
654 self.index
.is_yanked(pkg
, &mut *self.ops
)