交互式在线版本: 在 Colab 中打开

使用 funsors 的命名张量表示法(第 1 部分)

引言

Named Tensor Notation (Chiang, Rush, Barak 2021) 中引入的带有命名轴的数学表示法提高了涉及多维数组的数学公式的可读性。这包括张量操作,例如逐元素操作、归约、收缩、重命名、索引和广播。在本教程中,我们将 Named Tensor Notation 中的示例翻译成 funsors,以演示这些操作在 funsor 库中的实现,并让读者熟悉 funsor 语法。第 1 部分涵盖了来自 2 非正式概述3.4.2 高级索引5 正式定义的示例。

首先,让我们导入一些依赖。

[ ]:
!pip install funsor[torch]@git+https://github.com/pyro-ppl/funsor
[1]:
from torch import tensor

import funsor
import funsor.ops as ops
from funsor import Number, Tensor, Variable
from funsor.domains import Bint

funsor.set_backend("torch")

命名张量

每个张量轴都有一个名称

\[\begin{split}\begin{aligned} A &\in \mathbb{R}^{\mathsf{\vphantom{fg}height}[3] \times \mathsf{\vphantom{fg}width}[3]} = \mathbb{R}^{\mathsf{\vphantom{fg}width}[3] \times \mathsf{\vphantom{fg}height}[3]} \\ A &= \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\\begin{bmatrix} 3 & 1 & 4 \\ 1 & 5 & 9 \\ 2 & 6 & 5 \end{bmatrix}\end{array} = \mathsf{\vphantom{fg}width} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}height}\\\begin{bmatrix} 3 & 1 & 2 \\ 1 & 5 & 6 \\ 4 & 9 & 5 \end{bmatrix}\end{array}. \end{aligned}\end{split}\]
[2]:
A = Tensor(tensor([[3, 1, 4], [1, 5, 9], [2, 6, 5]]))["height", "width"]

使用命名索引访问 \(A\) 的元素

\[A_{\mathsf{\vphantom{fg}height}(1), \mathsf{\vphantom{fg}width}(3)} = A_{\mathsf{\vphantom{fg}width}(3), \mathsf{\vphantom{fg}height}(1)} = 4\]
[3]:
# A(height=0, width=2) =
A(width=2, height=0)
[3]:
Tensor(tensor(4))

部分索引

\[\begin{split}\begin{aligned} A_{\mathsf{\vphantom{fg}height}(1)} &= \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\ \begin{bmatrix} 3 & 1 & 4 \end{bmatrix}\end{array} & A_{\mathsf{\vphantom{fg}width}(3)} &= \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}height}\\ \begin{bmatrix} 4 & 9 & 5 \end{bmatrix}\end{array}. \end{aligned}\end{split}\]
[4]:
A(height=0)
[4]:
Tensor(tensor([3, 1, 4]), {'width': Bint[3]})
[5]:
A(width=2)
[5]:
Tensor(tensor([4, 9, 5]), {'height': Bint[3]})

命名张量操作

逐元素操作和广播

逐元素操作

\[\begin{split}\frac1{1+\exp(-A)} = \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\ \begin{bmatrix} \frac 1{1+\exp(-3)} & \frac 1{1+\exp(-1)} & \frac 1{1+\exp(-4)} \\[1ex] \frac 1{1+\exp(-1)} & \frac 1{1+\exp(-5)} & \frac 1{1+\exp(-9)} \\[1ex] \frac 1{1+\exp(-2)} & \frac 1{1+\exp(-6)} & \frac 1{1+\exp(-5)} \end{bmatrix}\end{array}.\end{split}\]
[6]:
# A.sigmoid() =
# ops.sigmoid(A) =
# 1 / (1 + ops.exp(-A)) =
1 / (1 + (-A).exp())
[6]:
Tensor(tensor([[0.9526, 0.7311, 0.9820],
               [0.7311, 0.9933, 0.9999],
               [0.8808, 0.9975, 0.9933]]), {'height': Bint[3], 'width': Bint[3]})

不同形状的张量在应用操作之前会自动相互广播。令

\[\begin{split}\begin{aligned} x &\in \mathbb{R}^{\mathsf{\vphantom{fg}height}[3]} & y &\in \mathbb{R}^{\mathsf{\vphantom{fg}width}[3]} \\ x &= \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\\ \begin{bmatrix} 2 \\ 7 \\ 1 \end{bmatrix}\end{array} & y &= \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\\begin{bmatrix} 1 & 4 & 1 \end{bmatrix}\end{array}. \end{aligned}\end{split}\]
[7]:
x = Tensor(tensor([2, 7, 1]))["height"]

y = Tensor(tensor([1, 4, 1]))["width"]

二元加法操作

\[\begin{split}\begin{aligned} A + x &= \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\\begin{bmatrix} 3+2 & 1+2 & 4+2 \\ 1+7 & 5+7 & 9+7 \\ 2+1 & 6+1 & 5+1 \end{bmatrix}\end{array} & A + y &= \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\\begin{bmatrix} 3+1 & 1+4 & 4+1 \\ 1+1 & 5+4 & 9+1 \\ 2+1 & 6+4 & 5+1 \end{bmatrix}\end{array}. \end{aligned}\end{split}\]
[8]:
# ops.add(A, x) =
A + x
[8]:
Tensor(tensor([[ 5,  3,  6],
               [ 8, 12, 16],
               [ 3,  7,  6]]), {'height': Bint[3], 'width': Bint[3]})
[9]:
# ops.add(A, y) =
A + y
[9]:
Tensor(tensor([[ 4,  5,  5],
               [ 2,  9, 10],
               [ 3, 10,  6]]), {'height': Bint[3], 'width': Bint[3]})

二元乘法操作

\[\begin{split}A \odot x = \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\\begin{bmatrix} 3\cdot2 & 1\cdot2 & 4\cdot2 \\ 1\cdot7 & 5\cdot7 & 9\cdot7 \\ 2\cdot1 & 6\cdot1 & 5\cdot1 \end{bmatrix}\end{array}\end{split}\]
[10]:
# ops.mul(A, x) =
A * x
[10]:
Tensor(tensor([[ 6,  2,  8],
               [ 7, 35, 63],
               [ 2,  6,  5]]), {'height': Bint[3], 'width': Bint[3]})

二元最大值操作

\[\begin{split}\max(A, y) = \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\\begin{bmatrix} \max(3, 1) & \max(1, 4) & \max(4, 1) \\ \max(1, 1) & \max(5, 4) & \max(9, 1) \\ \max(2, 1) & \max(6, 4) & \max(5, 1) \end{bmatrix}\end{array}.\end{split}\]
[11]:
ops.max(A, y)
[11]:
Tensor(tensor([[3, 4, 4],
               [1, 5, 9],
               [2, 6, 5]]), {'height': Bint[3], 'width': Bint[3]})

归约

可以通过调用 .reduce 方法并指定归约算子和归约轴的名称来对命名轴进行归约。注意,归约仅定义于满足结合律和交换律的算子。

\[\begin{split}\sum\limits_{\substack{\mathsf{\vphantom{fg}height}}} A = \sum_i A_{\mathsf{\vphantom{fg}height}(i)} = \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\ \begin{bmatrix} 3+1+2 & 1+5+6 & 4+9+5 \end{bmatrix}\end{array}.\end{split}\]
[12]:
A.reduce(ops.add, "height")
[12]:
Tensor(tensor([ 6, 12, 18]), {'width': Bint[3]})
\[\begin{split}\sum\limits_{\substack{\mathsf{\vphantom{fg}width}}} A = \sum_j A_{\mathsf{\vphantom{fg}width}(j)} = \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}height}\\ \begin{bmatrix} 3+1+4 & 1+5+9 & 2+6+5 \end{bmatrix}\end{array}.\end{split}\]
[13]:
A.reduce(ops.add, "width")
[13]:
Tensor(tensor([ 8, 15, 13]), {'height': Bint[3]})

跨多个轴的归约

\[\begin{split}\sum\limits_{\substack{\mathsf{\vphantom{fg}height}\\ \mathsf{\vphantom{fg}width}}} A = \sum_i \sum_j A_{\mathsf{\vphantom{fg}height}(i),\mathsf{\vphantom{fg}width}(j)} = 3+1+4+1+5+9+2+6+5.\end{split}\]
[14]:
A.reduce(ops.add, {"height", "width"})
[14]:
Tensor(tensor(36))

乘法归约

\[\begin{split}\prod\limits_{\substack{\mathsf{\vphantom{fg}height}}} A = \prod_i A_{\mathsf{\vphantom{fg}height}(i)} = \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\ \begin{bmatrix} 3\cdot1\cdot2 & 1\cdot5\cdot6 & 4\cdot9\cdot5 \end{bmatrix}\end{array}.\end{split}\]
[15]:
A.reduce(ops.mul, "height")
[15]:
Tensor(tensor([  6,  30, 180]), {'width': Bint[3]})

最大值归约

\[\begin{split}\max\limits_{\substack{\mathsf{\vphantom{fg}height}}} A = \max \{A_{\mathsf{\vphantom{fg}height}(i)} \mid 1 \leq i \leq n\} = \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\ \begin{bmatrix} \max(3, 1, 2) & \max(1, 5, 6) & \max(4, 9, 5) \end{bmatrix}\end{array}.\end{split}\]
[16]:
A.reduce(ops.max, "height")
[16]:
Tensor(tensor([3, 6, 9]), {'width': Bint[3]})

收缩

收缩操作可以写成逐元素乘法,然后对一个轴求和

\[\begin{split}A \mathbin{\underset{\substack{\mathsf{\vphantom{fg}width}}}{\vphantom{fg}\odot}} y = \sum_j A_{\mathsf{\vphantom{fg}width}(j)} \, y_{\mathsf{\vphantom{fg}width}(j)} = \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\\\begin{bmatrix} 3\cdot 1 + 1\cdot 4 + 4\cdot 1 \\ 1\cdot 1 + 5\cdot 4 + 9\cdot 1 \\ 2\cdot 1 + 6\cdot 4 + 5\cdot 1 \end{bmatrix}\end{array}.\end{split}\]
[17]:
(A * y).reduce(ops.add, "width")
[17]:
Tensor(tensor([11, 30, 31]), {'height': Bint[3]})

线性代数中的一些其他操作

\[x \mathbin{\underset{\substack{\mathsf{\vphantom{fg}height}}}{\vphantom{fg}\odot}} x = \sum_i x_{\mathsf{\vphantom{fg}height}(i)} \, x_{\mathsf{\vphantom{fg}height}(i)} \qquad \text{内积}\]
[18]:
(x * x).reduce(ops.add, "height")
[18]:
Tensor(tensor(54))
\[[x \odot y]_{\mathsf{\vphantom{fg}height}(i), \mathsf{\vphantom{fg}width}(j)} = x_{\mathsf{\vphantom{fg}height}(i)} \, y_{\mathsf{\vphantom{fg}width}(j)} \qquad \text{外积}\]
[19]:
x * y
[19]:
Tensor(tensor([[ 2,  8,  2],
               [ 7, 28,  7],
               [ 1,  4,  1]]), {'height': Bint[3], 'width': Bint[3]})
\[A \mathbin{\underset{\substack{\mathsf{\vphantom{fg}width}}}{\vphantom{fg}\odot}} y = \sum_i A_{\mathsf{\vphantom{fg}width}(i)} \, y_{\mathsf{\vphantom{fg}width}(i)} \qquad \text{矩阵-向量乘积}\]
[20]:
(A * y).reduce(ops.add, "width")
[20]:
Tensor(tensor([11, 30, 31]), {'height': Bint[3]})
\[\begin{split}x \mathbin{\underset{\substack{\mathsf{\vphantom{fg}height}}}{\vphantom{fg}\odot}} A = \sum_i x_{\mathsf{\vphantom{fg}height}(i)} \, A_{\mathsf{\vphantom{fg}height}(i)} \qquad \text{向量-矩阵乘积} \\\end{split}\]
[21]:
(x * A).reduce(ops.add, "height")
[21]:
Tensor(tensor([15, 43, 76]), {'width': Bint[3]})
\[A \mathbin{\underset{\substack{\mathsf{\vphantom{fg}width}}}{\vphantom{fg}\odot}} B = \sum_i A_{\mathsf{\vphantom{fg}width}(i)} \odot B_{\mathsf{\vphantom{fg}width}(i)} \qquad \text{矩阵-矩阵乘积}~(B \in \mathbb{R}^{\mathsf{\vphantom{fg}width}\times \mathsf{\vphantom{fg}width2}})\]
[22]:
B = Tensor(
    tensor([[3, 2, 5], [5, 4, 0], [8, 3, 6]]),
)["width", "width2"]

(A * B).reduce(ops.add, "width")
[22]:
Tensor(tensor([[ 46,  22,  39],
               [100,  49,  59],
               [ 76,  43,  40]]), {'height': Bint[3], 'width2': Bint[3]})

收缩可以推广到其他二元操作和归约操作

\[\begin{split}\max_{\mathsf{\vphantom{fg}width}} (A + y) = \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\\\begin{bmatrix} \max(3+1, 1+4, 4+1) \\ \max(1+1, 5+4, 9+1) \\ \max(2+1, 6+4, 5+1) \end{bmatrix}\end{array}.\end{split}\]
[23]:
(A + y).reduce(ops.max, "width")
[23]:
Tensor(tensor([ 5, 10, 10]), {'height': Bint[3]})

重命名和重塑

重命名 funsors 很简单

\[\begin{split}A_{\mathsf{\vphantom{fg}height}\rightarrow\mathsf{\vphantom{fg}height2}} = \mathsf{\vphantom{fg}height2} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width} \\\begin{bmatrix} 3 & 1 & 4 \\ 1 & 5 & 9 \\ 2 & 6 & 5 \\ \end{bmatrix}\end{array}.\end{split}\]
[24]:
# A(height=Variable("height2", Bint[3]))
A(height="height2")
[24]:
Tensor(tensor([[3, 1, 4],
               [1, 5, 9],
               [2, 6, 5]]), {'height2': Bint[3], 'width': Bint[3]})
\[\begin{split}A_{(\mathsf{\vphantom{fg}height},\mathsf{\vphantom{fg}width})\rightarrow\mathsf{\vphantom{fg}layer}} = \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}layer}\\ \begin{bmatrix} 3 & 1 & 4 & 1 & 5 & 9 & 2 & 6 & 5 \end{bmatrix}\end{array}\end{split}\]
[25]:
layer = Variable("layer", Bint[9])

A_layer = A(height=layer // Number(3, 4), width=layer % Number(3, 4))
A_layer
[25]:
Tensor(tensor([3, 1, 4, 1, 5, 9, 2, 6, 5]), {'layer': Bint[9]})
\[\begin{split}A_{\mathsf{\vphantom{fg}layer}\rightarrow(\mathsf{\vphantom{fg}height},\mathsf{\vphantom{fg}width})} = \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width} \\\begin{bmatrix} 3 & 1 & 4 \\ 1 & 5 & 9 \\ 2 & 6 & 5 \\ \end{bmatrix}\end{array}.\end{split}\]
[26]:
height = Variable("height", Bint[3])
width = Variable("width", Bint[3])

A_layer(layer=height * Number(3, 4) + width % Number(3, 4))
[26]:
Tensor(tensor([[3, 1, 4],
               [1, 5, 9],
               [2, 6, 5]]), {'height': Bint[3], 'width': Bint[3]})

高级索引

所有高级索引都可以通过 funsors 中的名称替换来实现。

\[\begin{split}\mathop{\underset{\substack{\mathsf{\vphantom{fg}ax}}}{\vphantom{fg}\mathrm{index}}} \colon \mathbb{R}^{\mathsf{\vphantom{fg}ax}[n]} \times [n] \rightarrow \mathbb{R}\\ \mathop{\underset{\substack{\mathsf{\vphantom{fg}ax}}}{\vphantom{fg}\mathrm{index}}}(A, i) = A_{\mathsf{\vphantom{fg}ax}(i)}.\end{split}\]
\[\begin{split}\begin{aligned} E &\in \mathbb{R}^{\mathsf{\vphantom{fg}vocab}[n] \times \mathsf{\vphantom{fg}emb}} \\ i &\in [n] \\ I &\in [n]^{\mathsf{\vphantom{fg}seq}} \\ P &\in \mathbb{R}^{\mathsf{\vphantom{fg}seq}\times \mathsf{\vphantom{fg}vocab}[n]} \end{aligned}\end{split}\]

部分索引 \(\mathop{\underset{\substack{\mathsf{\vphantom{fg}vocab}}}{\vphantom{fg}\mathrm{index}}}(E,i)\)

[27]:
E = Tensor(
    tensor([[2, 1, 5], [3, 4, 2], [1, 3, 7], [1, 4, 3], [5, 9, 2]]),
)["vocab", "emb"]

E(vocab=2)
[27]:
Tensor(tensor([1, 3, 7]), {'emb': Bint[3]})

整数数组索引 \(\mathop{\underset{\substack{\mathsf{\vphantom{fg}vocab}}}{\vphantom{fg}\mathrm{index}}}(E,I)\)

[28]:
I = Tensor(tensor([3, 2, 4, 0]), dtype=5)["seq"]

E(vocab=I)
[28]:
Tensor(tensor([[1, 4, 3],
               [1, 3, 7],
               [5, 9, 2],
               [2, 1, 5]]), {'seq': Bint[4], 'emb': Bint[3]})

Gather 操作 \(\mathop{\underset{\substack{\mathsf{\vphantom{fg}vocab}}}{\vphantom{fg}\mathrm{index}}}(P,I)\)

[29]:
P = Tensor(
    tensor([[6, 2, 4, 2], [8, 2, 1, 3], [5, 5, 7, 0], [1, 3, 8, 2], [5, 9, 2, 3]]),
)["vocab", "seq"]

P(vocab=I)
[29]:
Tensor(tensor([1, 5, 2, 2]), {'seq': Bint[4]})

使用两个整数数组进行索引

\[\begin{split}\begin{aligned} |\mathsf{\vphantom{fg}seq}| &= m \\ I_1 &= [m]^\mathsf{\vphantom{fg}subseq}\\ I_2 &= [n]^\mathsf{\vphantom{fg}subseq}\\ S &= \mathop{\underset{\substack{\mathsf{\vphantom{fg}vocab}}}{\vphantom{fg}\mathrm{index}}}(\mathop{\underset{\substack{\mathsf{\vphantom{fg}seq}}}{\vphantom{fg}\mathrm{index}}}(P, I_1), I_2) \in \mathbb{R}^{\mathsf{\vphantom{fg}subseq}} \\ S_{\mathsf{\vphantom{fg}subseq}(i)} &= P_{\mathsf{\vphantom{fg}seq}(I_{\mathsf{\vphantom{fg}subseq}(i)}), \mathsf{\vphantom{fg}vocab}(I_{\mathsf{\vphantom{fg}subseq}(i)})}. \end{aligned}\end{split}\]
[30]:
I1 = Tensor(tensor([1, 2, 0]), dtype=4)["subseq"]
I2 = Tensor(tensor([3, 0, 4]), dtype=5)["subseq"]

P(seq=I1, vocab=I2)
[30]:
Tensor(tensor([3, 4, 5]), {'subseq': Bint[3]})