此文尚未完工,可能还会有重大改动或者被废弃

这个项目不知道为啥带个 “intel”。就当 intel 赞助了吧。

mirror-intel 的工作流程大概是请求命中对应代理/缓存策略,若有缓存就返回缓存,没有就提交一个下载任务,然后反向代理。

话不多说,开始看代码!

Repos

crate::repos 里硬编码了所有 endpoint 的路由和处理方式。如果要加 endpoint 就要改这里。

日志

开篇日志。本项目基于 tokio-rs/tracing 做日志;它通过 crate::setup_log 函数绑定 stdout 并从环境读取日志格式配置。

配置文件

本项目基于 figment 做配置文件。

它从 Rocket.tomlmirror-intel.toml 两个文件中读取配置,项目也提供了模板配置文件。

配置文件结构定义如下:

/// Global application config.
pub struct Config {
    /// Upstream endpoints for redirecting, reverse-proxying and caching.
    pub endpoints: Endpoints,
    /// Github release related configs.
    pub github_release: GithubReleaseConfig,
    
    /// S3 storage to store cached files.
    pub s3: S3Config,
    /// Path of temporary buffer directory.
    pub buffer_path: PathBuf,
    /// and other config items
    ...
}

/// Endpoints of origin servers.
pub struct Endpoints {
    pub rust_static: String,
    pub homebrew_bottles: String,
    pub pypi_packages: String,
    pub fedora_iot: String,
    pub fedora_ostree: String,
    pub flathub: String,
    pub crates_io: String,
    pub dart_pub: String,
    pub guix: String,
    pub pytorch_wheels: String,
    pub linuxbrew_bottles: String,
    pub sjtug_internal: String,
    pub flutter_infra: String,
    pub flutter_infra_release: String,
    pub github_release: String,
    pub nix_channels_store: String,
    pub pypi_simple: String,
    pub opam_cache: String,
    pub gradle_distribution: String,
    /// Upstream override rules.
    pub overrides: Vec<EndpointOverride>,
    /// Paths starts with any of these prefixes will be unconditionally redirected to S3 storage.
    pub s3_only: Vec<String>,
}

/// Configuration for S3 storage.
pub struct S3Config {
    /// Name of the S3 storage.
    pub name: String,
    /// S3 endpoint of the storage service.
    pub endpoint: String,
    /// Website endpoint of the S3 service.
    pub website_endpoint: String,
    /// Bucket name.
    pub bucket: String,
}

/// Configuration for Github Release endpoint.
pub struct GithubReleaseConfig {
    /// Repositories allowed to be cached.
    ///
    /// Accessing a repository that is not in this list will result in an unconditional redirect.
    pub allow: Vec<String>,
}

/// An upstream endpoint override rule.
pub struct EndpointOverride {
    /// Name of the rule.
    ///
    /// Currently this field is only used for a descriptive purpose.
    pub name: String,
    /// Pattern to match against the origin.
    ///
    /// Note that only plain strings are supported, i.e. no regex, and substring matching is allowed.
    pub pattern: String,
    /// Replacement for the matched pattern.
    pub replace: String,
}

单配置文件解决一切!唯一不足就是 endpoint 列表硬编码在源码里,有的源需要特别处理,只好牺牲扩展性了。(不过 simple_intel 感觉可以做成动态的列表)

又不是不能用.jpg

Task

本项目基于一个 Task 的 tokio::sync::mpsc::bounded::channel 队列实现了带文件缓存代理服务器。

核心在 crate::artifacts::download_artifacts:通过不断接收 Task(来自于 cache miss),然后通过 crate::artifacts::cache_task 实现下载的同时上传到 s3。具体而言,对于不同的文件大小他会有三种策略: (crate::artifacts::into_stream)

  • 很小 (<1M) 或未知大小:直接拷贝到 s3
  • 不是太大 (<config.ignore_threshold_mbMB):先下载到内存里再上传
  • 很大:先下载到文件再上传

可能是为了减少和 s3 的连接时间减缓 s3 压力吧。

Prometheus

本项目采用 Prometheus 作为监控数据源,以 IntelMission 注入 actix;metrics_endpoint 函数作为一个路由返回 Prometheus 编码过的 metric 信息。

Intel Path

为了 s3 和 fs 安全本项目检查了 path match_info 的编码:不能出现 . * : > < \ 等字符。

Browse

crate::browse 基于 s3 接口提供了一个朴素的文件浏览。没有前后端分离,而是直接返回一个 html。真直接!

后记

架构还算清晰简单,乃至有点朴素。迟门🙏

有空我大概会搓一个 axum 版的?