Zigbook – Learn the Zig Programming Language

概述

我们的第二个项目从算术升级到文本处理：一个接受搜索模式和文件路径，然后只打印匹配行的微型grep克隆。这个练习加强了前一章的参数处理，同时引入了标准库中的文件I/O和切片工具。#命令行标志，File.zig

我们不逐字节流式传输，而是依赖Zig的内存安全助手来加载文件，将其分割成行，并用直接的子字符串检查来显示匹配项。每个失败路径在退出前都会产生一个用户友好的消息，因此该工具在shell脚本中表现得可预测——这是我们将带到下一个项目的主题。相关API请参见#命令行标志和File.zig，错误处理模式请参见#错误处理。

学习目标

实现一个支持--help、强制参数数量并在滥用时优雅终止的命令行解析例程。
使用std.fs.File.readToEndAlloc和std.mem.splitScalar来加载和迭代文件内容（参见mem.zig）。
使用std.mem.indexOf过滤行，并通过stdout报告结果，同时将诊断信息定向到stderr（参见debug.zig）。

构建搜索框架

我们从连接CLI前端开始：分配参数，尊重--help，并确认正好有两个位置参数——模式和路径——存在。任何偏差都会打印一个用法横幅并以代码1退出，避免堆栈跟踪，同时仍然向调用者发出失败信号。

验证参数和用法路径

这个骨架反映了第5章的TempConv CLI，但现在我们向stderr发出诊断信息，并在输入错误或文件无法打开时显式退出。printUsage将横幅放在一个地方，而std.process.exit保证我们在消息写完后立即停止。

加载和分割文件

我们不处理部分读取，而是用File.readToEndAlloc将文件加载到内存中，将大小限制在8兆字节，以防止意外的巨大文件。然后，单次调用std.mem.splitScalar会产生一个以换行符分隔的段的迭代器，我们对其进行修剪以适应Windows风格的回车符。

理解std.fs结构

在深入文件操作之前，了解Zig的文件系统API是如何组织的很有帮助。std.fs模块提供了一个分层结构，使文件访问具有可移植性和可组合性：

graph TB subgraph "文件系统API层次结构" CWD["std.fs.cwd() 返回：Dir"] DIR["Dir类型 (fs/Dir.zig)"] FILE["File类型 (fs/File.zig)"] end subgraph "目录操作" OPENFILE["openFile(path, flags) 返回：File"] MAKEDIR["makeDir(path)"] OPENDIR["openDir(path) 返回：Dir"] ITERATE["iterate() 返回：Iterator"] end subgraph "文件操作" READ["read(buffer) 返回：读取的字节数"] READTOEND["readToEndAlloc(allocator, max_size) 返回：[]u8"] WRITE["write(bytes) 返回：写入的字节数"] SEEK["seekTo(pos)"] CLOSE["close()"] end CWD --> DIR DIR --> OPENFILE DIR --> MAKEDIR DIR --> OPENDIR DIR --> ITERATE OPENFILE --> FILE OPENDIR --> DIR FILE --> READ FILE --> READTOEND FILE --> WRITE FILE --> SEEK FILE --> CLOSE

关键概念：

入口点：std.fs.cwd()返回一个表示当前工作目录的Dir句柄
Dir类型：提供目录级别的操作，如打开文件、创建子目录和迭代内容
File类型：表示一个打开的文件，具有读/写操作
链式调用：你调用cwd().openFile()是因为openFile()是Dir类型的一个方法

为什么这个结构对Grep-Lite很重要：

Zig

// 这就是我们为什么这样写：
const file = try std.fs.cwd().openFile(path, .{});
//                    ^        ^
//                    |        +-- Dir上的方法
//                    +----------- 返回Dir句柄

这个两步过程（cwd() → openFile()）让你控制在哪个目录中打开文件。虽然这个例子使用当前目录，但你同样可以使用：

用于绝对路径的std.fs.openDirAbsolute()
用于相对于任何目录句柄的文件的dir.openFile()
完全跳过Dir的std.fs.openFileAbsolute()

这种可组合的设计使文件系统代码可测试（使用临时目录）和可移植（相同的API跨平台工作）。

扫描匹配项

一旦我们为每一行拥有一个切片，匹配就成了使用std.mem.indexOf的一行代码。我们重用TempConv模式，为成功输出保留stdout，为诊断信息保留stderr，使该工具对管道友好。

完整的Grep-Lite列表

下面的完整列表突出了辅助函数如何组合在一起。请注意将每个代码块与上面各节联系起来的注释。

Zig

const std = @import("std");

// Chapter 6 – Grep-Lite: stream a file line by line and echo only the matches
// to stdout while errors become clear diagnostics on stderr.
// 第6章 - Grep-Lite：逐行流式读取文件并仅将匹配项输出到stdout
// 同时将错误转换为stderr上的清晰诊断信息。

const CliError = error{MissingArgs};

fn printUsage() void {
    std.debug.print("usage: grep-lite <pattern> <path>\n", .{});
}

fn trimNewline(line: []const u8) []const u8 {
    if (line.len > 0 and line[line.len - 1] == '\r') {
        return line[0 .. line.len - 1];
    }
    return line;
}

pub fn main() !void {
    const allocator = std.heap.page_allocator;
    const args = try std.process.argsAlloc(allocator);
    defer std.process.argsFree(allocator, args);

    if (args.len == 1 or (args.len == 2 and std.mem.eql(u8, args[1], "--help"))) {
        printUsage();
        return;
    }

    if (args.len != 3) {
        std.debug.print("error: expected a pattern and a path\n", .{});
        printUsage();
        std.process.exit(1);
    }

    const pattern = args[1];
    const path = args[2];

    var file = std.fs.cwd().openFile(path, .{ .mode = .read_only }) catch {
        std.debug.print("error: unable to open '{s}'\n", .{path});
        std.process.exit(1);
    };
    defer file.close();

    // Buffered stdout using modern Writer API
    // 使用现代Writer API的缓冲stdout
    var out_buf: [8 * 1024]u8 = undefined;
    var file_writer = std.fs.File.writer(std.fs.File.stdout(), &out_buf);
    const stdout = &file_writer.interface;

    // Section 1.2: load the complete file eagerly while enforcing a guard so
    // unexpected multi-megabyte inputs do not exhaust memory.
    // 第1.2节：积极加载完整文件，同时强制执行保护以防止
    // 意外的多兆字节输入耗尽内存。
    const max_bytes = 8 * 1024 * 1024;
    const contents = file.readToEndAlloc(allocator, max_bytes) catch |err| switch (err) {
        error.FileTooBig => {
            std.debug.print("error: file exceeds {} bytes limit\n", .{max_bytes});
            std.process.exit(1);
        },
        else => return err,
    };
    defer allocator.free(contents);

    // Section 2.1: split the buffer on newlines; each slice references the
    // original allocation so we incur zero extra copies.
    // 第2.1节：在换行符处分割缓冲区；每个切片引用原始分配
    // 因此我们不会产生额外的复制开销。
    var lines = std.mem.splitScalar(u8, contents, '\n');
    var matches: usize = 0;

    while (lines.next()) |raw_line| {
        const line = trimNewline(raw_line);

        // Section 2: reuse `std.mem.indexOf` so we highlight exact matches
        // without building temporary slices.
        // 第2节：重用`std.mem.indexOf`，这样我们可以在不构建临时切片的情况下
        // 高亮显示精确匹配项。
        if (std.mem.indexOf(u8, line, pattern) != null) {
            matches += 1;
            try stdout.print("{s}\n", .{line});
        }
    }

    if (matches == 0) {
        std.debug.print("no matches for '{s}' in {s}\n", .{ pattern, path });
    }

    // Flush buffered stdout and finalize file position
    // 冲刷缓冲的stdout并最终确定文件位置
    try file_writer.end();
}

运行

Shell

$ zig run grep_lite.zig -- pattern grep_lite.zig

输出

Shell

    std.debug.print("usage: grep-lite <pattern> <path>\n", .{});
        std.debug.print("error: expected a pattern and a path\n", .{});
    const pattern = args[1];
        if (std.mem.indexOf(u8, line, pattern) != null) {
        std.debug.print("no matches for '{s}' in {s}\n", .{ pattern, path });

输出显示了包含字面词pattern的每一行源代码。对其他文件运行时，你的匹配列表会有所不同。

优雅地检测缺失的文件

为了使shell脚本可预测，当文件路径无法打开时，该工具会发出单行诊断信息并以非零状态退出。

Shell

$ zig run grep_lite.zig -- foo missing.txt

输出

Shell

error: unable to open 'missing.txt'

注意与警告

readToEndAlloc很简单，但会加载整个文件；如果需要处理非常大的输入，稍后可以添加流式读取器。
大小上限可以防止失控的内存分配。一旦你信任你的部署环境，就可以提高它或使其可配置。
此示例使用缓冲的stdout写入器进行匹配，并使用std.debug.print向stderr输出诊断信息；我们在退出时通过写入器的end()进行刷新（参见Io.zig）。

练习

在命令行上接受多个文件，并为每个匹配项打印一个path:line前缀（参见#for）。
通过使用std.ascii.toLower对模式和每一行进行规范化，添加一个--ignore-case标志（参见ascii.zig）。
通过在加载整个缓冲区后集成第三方匹配器来支持正则表达式。

替代方案和边缘情况

Windows文件通常以\r\n结尾；修剪回车符可以保持子字符串检查的清晰。
空模式当前匹配每一行。如果你倾向于将空字符串视为滥用，请引入一个显式的守卫。
要与更大的构建集成，请用zig build-exe步骤替换zig run，并将二进制文件打包到你的PATH中。