此文尚未完工,可能还会有重大改动或者被废弃
这个项目不知道为啥带个 “intel”。就当 intel 赞助了吧。
mirror-intel 的工作流程大概是请求命中对应代理/缓存策略,若有缓存就返回缓存,没有就提交一个下载任务,然后反向代理。
话不多说,开始看代码!
Repos
crate::repos
里硬编码了所有 endpoint 的路由和处理方式。如果要加 endpoint 就要改这里。
日志
开篇日志。本项目基于 tokio-rs/tracing 做日志;它通过 crate::setup_log
函数绑定 stdout 并从环境读取日志格式配置。
配置文件
本项目基于 figment 做配置文件。
它从 Rocket.toml
和 mirror-intel.toml
两个文件中读取配置,项目也提供了模板配置文件。
配置文件结构定义如下:
/// Global application config.
pub struct Config {
/// Upstream endpoints for redirecting, reverse-proxying and caching.
pub endpoints: Endpoints,
/// Github release related configs.
pub github_release: GithubReleaseConfig,
/// S3 storage to store cached files.
pub s3: S3Config,
/// Path of temporary buffer directory.
pub buffer_path: PathBuf,
/// and other config items
...
}
/// Endpoints of origin servers.
pub struct Endpoints {
pub rust_static: String,
pub homebrew_bottles: String,
pub pypi_packages: String,
pub fedora_iot: String,
pub fedora_ostree: String,
pub flathub: String,
pub crates_io: String,
pub dart_pub: String,
pub guix: String,
pub pytorch_wheels: String,
pub linuxbrew_bottles: String,
pub sjtug_internal: String,
pub flutter_infra: String,
pub flutter_infra_release: String,
pub github_release: String,
pub nix_channels_store: String,
pub pypi_simple: String,
pub opam_cache: String,
pub gradle_distribution: String,
/// Upstream override rules.
pub overrides: Vec<EndpointOverride>,
/// Paths starts with any of these prefixes will be unconditionally redirected to S3 storage.
pub s3_only: Vec<String>,
}
/// Configuration for S3 storage.
pub struct S3Config {
/// Name of the S3 storage.
pub name: String,
/// S3 endpoint of the storage service.
pub endpoint: String,
/// Website endpoint of the S3 service.
pub website_endpoint: String,
/// Bucket name.
pub bucket: String,
}
/// Configuration for Github Release endpoint.
pub struct GithubReleaseConfig {
/// Repositories allowed to be cached.
///
/// Accessing a repository that is not in this list will result in an unconditional redirect.
pub allow: Vec<String>,
}
/// An upstream endpoint override rule.
pub struct EndpointOverride {
/// Name of the rule.
///
/// Currently this field is only used for a descriptive purpose.
pub name: String,
/// Pattern to match against the origin.
///
/// Note that only plain strings are supported, i.e. no regex, and substring matching is allowed.
pub pattern: String,
/// Replacement for the matched pattern.
pub replace: String,
}
单配置文件解决一切!唯一不足就是 endpoint 列表硬编码在源码里,有的源需要特别处理,只好牺牲扩展性了。(不过 simple_intel
感觉可以做成动态的列表)
Task
本项目基于一个 Task 的 tokio::sync::mpsc::bounded::channel
队列实现了带文件缓存代理服务器。
核心在 crate::artifacts::download_artifacts
:通过不断接收 Task(来自于 cache miss),然后通过
crate::artifacts::cache_task
实现下载的同时上传到 s3。具体而言,对于不同的文件大小他会有三种策略: (crate::artifacts::into_stream
)
- 很小 (<1M) 或未知大小:直接拷贝到 s3
- 不是太大 (<
config.ignore_threshold_mb
MB):先下载到内存里再上传 - 很大:先下载到文件再上传
可能是为了减少和 s3 的连接时间减缓 s3 压力吧。
Prometheus
本项目采用 Prometheus 作为监控数据源,以 IntelMission
注入 actix;metrics_endpoint
函数作为一个路由返回 Prometheus 编码过的 metric 信息。
Intel Path
为了 s3 和 fs 安全本项目检查了 path match_info 的编码:不能出现 .
*
:
>
<
\
等字符。
Browse
crate::browse
基于 s3 接口提供了一个朴素的文件浏览。没有前后端分离,而是直接返回一个 html。真直接!
后记
架构还算清晰简单,乃至有点朴素。迟门🙏
有空我大概会搓一个 axum 版的?