Zigbook – Learn the Zig Programming Language

概述

有了标准库索引提供的地图，您现在深入探索Zig的集合类型——数据处理的主力。本章探讨动态数组（ArrayList）、哈希表（HashMap及其变体）、优先级结构（PriorityQueue）、链表、MultiArrayList和SegmentedList等专用容器，以及排序算法（参见array_list.zig和hash_map.zig）。每个集合都采用Zig的显式分配器模型，让您控制内存生命周期，并能够在测试期间检测泄漏。

与具有隐式垃圾收集的语言不同，Zig集合需要您显式调用deinit()或转移所有权。这种纪律性，结合标准库丰富的适配器套件（未管理变体、感知哨兵的切片、自定义上下文），使集合既强大又可预测。到本章结束时，您将自信地为用例选择正确的结构，并理解每种设计中固有的性能权衡（参见sort.zig）。

学习目标

使用ArrayList(T)处理动态数组：追加、插入、移除、迭代，并理解其重新分配策略。
使用HashMap和AutoHashMap通过自定义哈希和相等函数进行键值查找。
利用PriorityQueue进行最小/最大堆操作，并理解比较上下文（参见priority_queue.zig）。
应用std.sort进行原地排序，包括稳定和不稳定算法（pdqsort、块排序、插入排序）。
识别专用结构：用于数组结构布局的MultiArrayList、用于稳定指针的SegmentedList，以及用于侵入式设计的链表（参见multi_array_list.zig和segmented_list.zig）。
理解分配器的影响：集合增长如何触发重新分配，以及竞技场（arena）如何简化批量释放模式（参见第10章）。

ArrayList：动态数组

ArrayList(T)是Zig基础的可增长数组，类似于C++的std::vector或Rust的Vec<T>。它管理一个连续的T值切片，并根据需要扩展容量。您可以通过.items（当前切片）与之交互，并调用append、pop、insert和remove等方法。

基本操作

通过指定元素类型并传递一个分配器来创建一个ArrayList。完成后调用deinit()以释放后备内存。

Zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var list: std.ArrayList(i32) = .empty;
    defer list.deinit(allocator);

    try list.append(allocator, 10);
    try list.append(allocator, 20);
    try list.append(allocator, 30);

    for (list.items, 0..) |item, i| {
        std.debug.print("Item {d}: {d}\n", .{ i, item });
    }

    const popped = list.pop();
    std.debug.print("Popped: {d}\n", .{popped.?});
    std.debug.print("Remaining length: {d}\n", .{list.items.len});
}

构建

Shell

$ zig build-exe arraylist_basic.zig

运行

Shell

$ ./arraylist_basic

输出

Shell

Item 0: 10
Item 1: 20
Item 2: 30
Popped: 30
Remaining length: 2

ArrayList在满时会加倍容量（指数增长），以摊销重新分配的成本。如果您知道最终大小，可以使用try list.ensureTotalCapacity(allocator, n)进行预分配。

所有权和非托管变体

默认情况下，ArrayList(T)在内部存储其分配器（托管变体）。为了更明确地控制，可以通过直接访问.items和.capacity来使用非托管形式，或使用已弃用的Unmanaged API。现代模式是使用更简单的托管形式，除非您需要将分配与列表本身解耦。

Zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // ArrayList with explicit allocator
    var list: std.ArrayList(u32) = .empty;
    defer list.deinit(allocator);

    try list.append(allocator, 1);
    try list.append(allocator, 2);
    try list.append(allocator, 3);

    std.debug.print("Managed list length: {d}\n", .{list.items.len});

    // Transfer ownership to a slice
    const owned_slice = try list.toOwnedSlice(allocator);
    defer allocator.free(owned_slice);

    std.debug.print("After transfer, original list length: {d}\n", .{list.items.len});
    std.debug.print("Owned slice length: {d}\n", .{owned_slice.len});
}

构建和运行

Shell

$ zig build-exe arraylist_ownership.zig && ./arraylist_ownership

输出

Shell

Managed list length: 3
After transfer, original list length: 0
Owned slice length: 3

toOwnedSlice()会清空列表并返回后备内存作为切片——您需要负责使用allocator.free(slice)来释放它。

插入和移除

除了append和pop之外，ArrayList还支持数组中间的操作。orderedRemove维护元素顺序（移动后续元素），而swapRemove是O(1)操作，但不保留顺序（与最后一个元素交换）。

Zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var list: std.ArrayList(i32) = .empty;
    defer list.deinit(allocator);

    try list.appendSlice(allocator, &.{ 1, 2, 3, 4 });

    // Insert 99 at index 1
    try list.insert(allocator, 1, 99);
    std.debug.print("After insert at 1: {any}\n", .{list.items});

    // Remove at index 2 (shifts elements)
    _ = list.orderedRemove(2);
    std.debug.print("After orderedRemove at 2: {any}\n", .{list.items});

    // Remove at index 1 (swaps with last, no shift)
    _ = list.swapRemove(1);
    std.debug.print("After swapRemove at 1: {any}\n", .{list.items});
}

构建和运行

Shell

$ zig build-exe arraylist_insert_remove.zig && ./arraylist_insert_remove

输出

Shell

After insert at 1: [1, 99, 2, 3, 4]
After orderedRemove at 2: [1, 99, 3, 4]
After swapRemove at 1: [1, 4, 3]

在最坏的情况下，orderedRemove是O(n)操作（移除第一个元素需要移动所有其他元素）；当顺序不重要时，使用swapRemove以获得O(1)的性能。

HashMap：键值查找

Zig的哈希映射系列通过开放寻址和线性探测提供O(1)的平均查找时间。HashMap(K, V, Context, max_load_percentage)需要一个包含hash和eql函数的上下文。为方便起见，AutoHashMap为可哈希类型自动生成这些函数，而StringHashMap则专门用于[]const u8键。

StringHashMap基础

对于字符串键（[]const u8），请使用StringHashMap(V)，它提供优化的字符串哈希。请注意，AutoHashMap不支持像[]const u8这样的切片类型以避免歧义——请改用StringHashMap。

Zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var map = std.StringHashMap(i32).init(allocator);
    defer map.deinit();

    try map.put("foo", 42);
    try map.put("bar", 100);

    if (map.get("foo")) |value| {
        std.debug.print("Value for 'foo': {d}\n", .{value});
    }

    std.debug.print("Contains 'bar': {}\n", .{map.contains("bar")});
    std.debug.print("Contains 'baz': {}\n", .{map.contains("baz")});

    _ = map.remove("foo");
    std.debug.print("After removing 'foo', contains: {}\n", .{map.contains("foo")});
}

构建和运行

Shell

$ zig build-exe hashmap_basic.zig && ./hashmap_basic

输出

Shell

Value for 'foo': 42
Contains 'bar': true
Contains 'baz': false
After removing 'foo', contains: false

使用put插入或更新，get检索（返回?V），remove删除。使用contains检查存在性而不检索值。

用于字符串键的StringHashMap

当键是[]const u8时，使用StringHashMap(V)以获得优化的字符串哈希。请记住：映射不会复制键内存——您必须确保字符串的生命周期长于映射，或使用竞技场分配器。

Zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var population = std.StringHashMap(u32).init(allocator);
    defer population.deinit();

    try population.put("Seattle", 750_000);
    try population.put("Austin", 950_000);
    try population.put("Boston", 690_000);

    var iter = population.iterator();
    while (iter.next()) |entry| {
        std.debug.print("City: {s}, Population: {d}\n", .{ entry.key_ptr.*, entry.value_ptr.* });
    }
}

构建和运行

Shell

$ zig build-exe hashmap_string.zig && ./hashmap_string

输出

Shell

City: Seattle, Population: 750000
City: Austin, Population: 950000
City: Boston, Population: 690000

字符串键不会被映射复制——如果您传递栈分配或临时字符串，它们必须保持有效。请使用竞技场分配器或dupe来管理键的生命周期。

自定义哈希和相等性

对于autoHash不支持的类型，请定义一个包含自定义hash和eql函数的上下文。

Zig

const std = @import("std");

const Point = struct {
    x: i32,
    y: i32,
};

const PointContext = struct {
    pub fn hash(self: @This(), p: Point) u64 {
        _ = self;
        var hasher = std.hash.Wyhash.init(0);
        std.hash.autoHash(&hasher, p.x);
        std.hash.autoHash(&hasher, p.y);
        return hasher.final();
    }

    pub fn eql(self: @This(), a: Point, b: Point) bool {
        _ = self;
        return a.x == b.x and a.y == b.y;
    }
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var map = std.HashMap(Point, []const u8, PointContext, 80).init(allocator);
    defer map.deinit();

    try map.put(.{ .x = 10, .y = 20 }, "Alice");
    try map.put(.{ .x = 30, .y = 40 }, "Bob");

    var iter = map.iterator();
    while (iter.next()) |entry| {
        std.debug.print("Point({d}, {d}): {s}\n", .{ entry.key_ptr.x, entry.key_ptr.y, entry.value_ptr.* });
    }

    std.debug.print("Contains (10, 20): {}\n", .{map.contains(.{ .x = 10, .y = 20 })});
}

构建和运行

Shell

$ zig build-exe hashmap_custom.zig && ./hashmap_custom

输出

Shell

Point(10, 20): Alice
Point(30, 40): Bob
Contains (10, 20): true

HashMap(K, V, Context, max_load_percentage)中的上下文参数允许有状态的哈希（例如，加盐哈希）。对于无状态上下文，请传递void。

PriorityQueue：基于堆的优先级结构

PriorityQueue(T, Context, compareFn)根据您的比较函数实现一个二叉最小堆或最大堆。它支持add、peek、remove（弹出顶部元素）和removeIndex。

最小堆示例

最小堆首先弹出最小的元素。当第一个参数应排在第二个参数之前时，比较函数返回.lt。

Zig

const std = @import("std");

fn lessThan(context: void, a: i32, b: i32) std.math.Order {
    _ = context;
    return std.math.order(a, b);
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var queue = std.PriorityQueue(i32, void, lessThan).init(allocator, {});
    defer queue.deinit();

    try queue.add(10);
    try queue.add(5);
    try queue.add(20);
    try queue.add(1);

    while (queue.removeOrNull()) |item| {
        std.debug.print("Popped: {d}\n", .{item});
    }
}

构建和运行

Shell

$ zig build-exe priorityqueue_min.zig && ./priorityqueue_min

输出

Shell

Popped: 1
Popped: 5
Popped: 10
Popped: 20

对于最大堆，请反转比较逻辑：当a < b时返回.gt。

用于任务调度的优先级队列

优先级队列在调度方面表现出色：添加带优先级的任务，然后始终首先处理最高优先级的任务。

Zig

const std = @import("std");

const Task = struct {
    name: []const u8,
    priority: u32,
};

fn compareTasks(context: void, a: Task, b: Task) std.math.Order {
    _ = context;
    // Higher priority comes first (max-heap behavior)
    return std.math.order(b.priority, a.priority);
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var queue = std.PriorityQueue(Task, void, compareTasks).init(allocator, {});
    defer queue.deinit();

    try queue.add(.{ .name = "Documentation", .priority = 1 });
    try queue.add(.{ .name = "Feature request", .priority = 5 });
    try queue.add(.{ .name = "Critical bug", .priority = 10 });

    while (queue.removeOrNull()) |task| {
        std.debug.print("Processing: {s} (priority {d})\n", .{ task.name, task.priority });
    }
}

构建和运行

Shell

$ zig build-exe priorityqueue_tasks.zig && ./priorityqueue_tasks

输出

Shell

Processing: Critical bug (priority 10)
Processing: Feature request (priority 5)
Processing: Documentation (priority 1)

PriorityQueue内部使用堆，因此add是O(log n)，peek是O(1)，remove是O(log n)。

排序

Zig的std.sort模块提供了多种算法：insertion（稳定，O(n²)）、heap（不稳定，O(n log n)）、pdq（模式克星快速排序，最坏情况O(n log n)）和block（稳定，O(n log n)但需要额外内存）。对于大多数用例，默认推荐使用pdq。

基本排序

使用切片、上下文和lessThan函数调用std.sort.pdq。

Zig

const std = @import("std");

fn lessThan(context: void, a: i32, b: i32) bool {
    _ = context;
    return a < b;
}

fn greaterThan(context: void, a: i32, b: i32) bool {
    _ = context;
    return a > b;
}

pub fn main() !void {
    var numbers = [_]i32{ 5, 2, 8, 1, 10 };

    std.sort.pdq(i32, &numbers, {}, lessThan);
    std.debug.print("Sorted ascending: {any}\n", .{numbers});

    std.sort.pdq(i32, &numbers, {}, greaterThan);
    std.debug.print("Sorted descending: {any}\n", .{numbers});
}

构建和运行

Shell

$ zig build-exe sort_basic.zig && ./sort_basic

输出

Shell

Sorted ascending: [1, 2, 5, 8, 10]
Sorted descending: [10, 8, 5, 2, 1]

pdq速度快但不稳定。如果需要稳定性（相等元素保持其原始顺序），请使用block。

结构体排序

通过提供自定义比较函数按结构体字段排序。

Zig

const std = @import("std");

const Person = struct {
    name: []const u8,
    age: u32,
};

fn byAge(context: void, a: Person, b: Person) bool {
    _ = context;
    return a.age < b.age;
}

pub fn main() !void {
    var people = [_]Person{
        .{ .name = "Alice", .age = 30 },
        .{ .name = "Bob", .age = 25 },
        .{ .name = "Charlie", .age = 35 },
    };

    std.sort.pdq(Person, &people, {}, byAge);

    std.debug.print("Sorted by age:\n", .{});
    for (people) |person| {
        std.debug.print("{s}, age {d}\n", .{ person.name, person.age });
    }
}

构建和运行

Shell

$ zig build-exe sort_structs.zig && ./sort_structs

输出

Shell

Sorted by age:
Alice, age 30
Bob, age 25
Charlie, age 35

排序函数中的上下文参数可以保存状态（例如，排序方向标志或比较修饰符）。使用anytype以获得灵活性。

MultiArrayList：数组结构布局

MultiArrayList(T)以数组结构（SoA）格式存储结构体：每个字段都存储在自己的连续数组中，从而在跨多个元素访问单个字段时提高缓存局部性。

Zig

const std = @import("std");

const Entity = struct {
    id: u32,
    x: f32,
    y: f32,
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var entities = std.MultiArrayList(Entity){};
    defer entities.deinit(allocator);

    try entities.append(allocator, .{ .id = 1, .x = 10.5, .y = 20.3 });
    try entities.append(allocator, .{ .id = 2, .x = 30.1, .y = 40.7 });

    for (0..entities.len) |i| {
        const entity = entities.get(i);
        std.debug.print("Entity {d}: id={d}, x={d}, y={d}\n", .{ i, entity.id, entity.x, entity.y });
    }

    // Access a single field slice for efficient iteration
    const x_coords = entities.items(.x);
    var sum: f32 = 0;
    for (x_coords) |x| {
        sum += x;
    }
    std.debug.print("Sum of x coordinates: {d}\n", .{sum});
}

构建和运行

Shell

$ zig build-exe multiarraylist.zig && ./multiarraylist

输出

Shell

Entity 0: id=1, x=10.5, y=20.3
Entity 1: id=2, x=30.1, y=40.7
Sum of x coordinates: 40.6

当您频繁迭代单个字段（例如，游戏引擎中的位置）但很少需要整个结构体时，请使用MultiArrayList。这种布局可最大化CPU缓存效率。

SegmentedList：稳定指针

SegmentedList(T, prealloc_item_count)通过分配固定大小的段来增长，而不是重新分配单个连续数组。这确保了指向元素的指针在插入操作中保持有效。

Zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var list = std.SegmentedList(i32, 4){};
    defer list.deinit(allocator);

    try list.append(allocator, 10);
    try list.append(allocator, 20);

    // Take a pointer to the first element
    const first_ptr = list.at(0);
    std.debug.print("First item: {d}\n", .{first_ptr.*});

    // Append more items - pointer remains valid!
    try list.append(allocator, 30);

    std.debug.print("First item (after append): {d}\n", .{first_ptr.*});
    std.debug.print("List length: {d}\n", .{list.len});
}

构建和运行

Shell

$ zig build-exe segmentedlist.zig && ./segmentedlist

输出

Shell

First item: 10
First item (after append): 10
List length: 3

与ArrayList不同，指向SegmentedList元素的指针即使在您添加更多项目时也保持有效。当您需要稳定寻址时（例如，在其他数据结构中存储指针），请使用此功能。

链表

Zig提供DoublyLinkedList(T)和SinglyLinkedList(T)作为侵入式链表：节点直接嵌入链接指针（参见DoublyLinkedList.zig和SinglyLinkedList.zig）。这避免了每个节点的分配器开销，并能自然地与现有结构体集成。

Zig

const std = @import("std");

const Node = struct {
    data: i32,
    link: std.DoublyLinkedList.Node = .{},
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var list = std.DoublyLinkedList{};

    var node1 = try allocator.create(Node);
    node1.* = .{ .data = 10 };
    list.append(&node1.link);

    var node2 = try allocator.create(Node);
    node2.* = .{ .data = 20 };
    list.append(&node2.link);

    var node3 = try allocator.create(Node);
    node3.* = .{ .data = 30 };
    list.append(&node3.link);

    var it = list.first;
    while (it) |node| : (it = node.next) {
        const data_node: *Node = @fieldParentPtr("link", node);
        std.debug.print("Node: {d}\n", .{data_node.data});
    }

    // Clean up
    allocator.destroy(node1);
    allocator.destroy(node2);
    allocator.destroy(node3);
}

构建和运行

Shell

$ zig build-exe linkedlist.zig && ./linkedlist

输出

Shell

Node: 10
Node: 20
Node: 30

侵入式列表不拥有节点内存——您自己分配和管理节点。这功能强大，但需要遵守纪律以避免悬空指针错误。

专用映射

ArrayHashMap

ArrayHashMap将键和值存储在单独的数组中，保留插入顺序并允许按索引迭代（参见array_hash_map.zig）。

StaticStringMap

StaticStringMap(V)是用于字符串键的编译时完美哈希映射——查找速度快，运行时零分配或哈希开销（参见static_string_map.zig）。

Zig

const std = @import("std");

const status_codes = std.StaticStringMap(u32).initComptime(.{
    .{ "ok", 200 },
    .{ "created", 201 },
    .{ "not_found", 404 },
    .{ "server_error", 500 },
});

pub fn main() !void {
    std.debug.print("Status code for 'ok': {d}\n", .{status_codes.get("ok").?});
    std.debug.print("Status code for 'not_found': {d}\n", .{status_codes.get("not_found").?});
    std.debug.print("Status code for 'server_error': {d}\n", .{status_codes.get("server_error").?});
}

构建和运行

Shell

$ zig build-exe static_string_map.zig && ./static_string_map

输出

Shell

Status code for 'ok': 200
Status code for 'not_found': 404
Status code for 'server_error': 500

使用StaticStringMap进行编译时常量映射（例如，关键字表、命令解析器）。它会编译为最优的switch语句或查找表。

分配器对集合的影响

每个集合都需要一个分配器，在初始化时（ArrayList(T).init(allocator)）或每次操作时（非托管变体）传递。增长策略会触发重新分配，失败时返回error.OutOfMemory（参见第10章）。

用于批量释放的竞技场模式

当构建仅在单个作用域内存在的临时集合时，请使用竞技场分配器一次性释放所有内容。

Zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var arena = std.heap.ArenaAllocator.init(allocator);
    defer arena.deinit();
    const arena_allocator = arena.allocator();

    var list: std.ArrayList(i32) = .empty;
    for (0..1000) |i| {
        try list.append(arena_allocator, @intCast(i));
    }
    std.debug.print("List has {d} items\n", .{list.items.len});

    var map = std.AutoHashMap(u32, []const u8).init(arena_allocator);
    for (0..500) |i| {
        try map.put(@intCast(i), "value");
    }
    std.debug.print("Map has {d} entries\n", .{map.count()});

    // No need to call list.deinit() or map.deinit()
    // arena.deinit() frees everything at once
    std.debug.print("All freed at once via arena.deinit()\n", .{});
}

构建和运行

Shell

$ zig build-exe collections_arena.zig && ./collections_arena

输出

Shell

List has 1000 items
Map has 500 entries
All freed at once via arena.deinit()

竞技场不会调用单个集合的deinit()方法。它一次性释放所有内存。当您知道集合的生命周期不会超过竞技场的作用域时，请使用此模式（参见第10章）。

性能注意事项

ArrayList增长：加倍容量可摊销重新分配成本，但大额分配可能会失败。如果大小已知，请进行预分配。
HashMap加载因子：默认的max_load_percentage是80%。更高的值可以节省内存，但会增加冲突链。
排序稳定性：pdq速度最快但不稳定。当相等元素的顺序很重要时，请使用block或insertion。
MultiArrayList缓存：当迭代单个字段时，SoA布局表现出色，但对于全结构体访问会增加间接开销。
SegmentedList段：较小的prealloc_item_count意味着更多的段（更多的分配）；如果列表保持较小，较大的值会浪费内存。

练习

使用StringHashMap(u32)实现一个FrequencyMap，用于计算文本文件中单词的出现次数，然后使用PriorityQueue打印前10个最频繁的单词。
比较ArrayList和SegmentedList的性能：创建10,000个项目，获取前100个项目的指针，然后追加10,000个。验证在使用SegmentedList时指针保持有效，但在使用ArrayList时可能会失效。
使用HashMap进行查找，使用DoublyLinkedList进行驱逐顺序，编写一个LRU缓存。当达到容量时，移除最近最少使用的项目。
使用自定义比较器和std.sort.pdq按多个键（例如，按age排序，然后按name排序以解决平局）对ArrayList中的结构体进行排序。

警告、替代方案、边缘情况

非托管变体：大多数集合都有非托管对应物（例如，ArrayListUnmanaged(T)），用于手动分配器线程，在通用代码或在结构体中嵌入集合时很有用。
HashMap键的生命周期：映射不会复制键。请确保键内存的生命周期长于映射，或使用竞技场分配器来统一管理键存储。
迭代器失效：与C++类似，修改集合（追加、移除）可能会使迭代器或指向元素的指针失效。请务必检查每个操作的文档。
稳定与不稳定排序：如果您的数据具有必须保持相对顺序的相等元素（例如，按列对表进行排序，但为平局保留行顺序），请使用std.sort.block或insertion，而不是pdq。
树堆：Zig还提供std.Treap，这是一种树堆混合体，用于具有概率平衡的有序映射，当您既需要排序迭代又需要O(log n)操作时很有用（参见treap.zig）。