1. 磐创AI首页
  2. arxiv

在预测掩码词时,变形金刚是否会解析?

预训练语言模型已经被证明在其嵌入中编码了语言结构,例如依赖和组成句法树,而在无监督损失函数(如掩码语言建模)的训练中。有些人对这些模型是否真的在进行解析或只是与之弱相关的计算提出了疑问。我们研究以下问题:(a)是否可以显式描述具有实际嵌入维度、头数等的转换器,这些转换器能够进行解析-甚至是近似解析?(b)为什么预训练模型捕捉到了解析结构?本文在PCFG生成模型的生成建模背景下一步探讨这些问题。我们展示了像BERT或RoBERTa这样中等规模的掩码语言模型可以近似执行英语PCFG[马库斯等,1993]的内部外部算法。我们还证明Inside-Outside算法对于PCFG生成数据上掩码语言建模损失是最优的。我们还给出了一个由50个层,15个注意力头和1275维嵌入的转换器构造,使用其嵌入可以在PTB数据集上进行句法分析,其F1分数大于70%。我们对在PCFG生成数据上预训练的模型进行探究性实验,以表明这不仅可以恢复近似的语法树,而且可以恢复Inside-Outside算法计算的边际跨度概率,这表明掩码语言建模对这种算法具有隐含的偏好。
Pre-trained language models have been shown to encode linguistic structures,
e.g. dependency and constituency parse trees, in their embeddings while being
trained on unsupervised loss functions like masked language modeling. Some
doubts have been raised whether the models actually are doing parsing or only
some computation weakly correlated with it. We study questions: (a) Is it
possible to explicitly describe transformers with realistic embedding
dimension, number of heads, etc. that are capable of doing parsing — or even
approximate parsing? (b) Why do pre-trained models capture parsing structure?
This paper takes a step toward answering these questions in the context of
generative modeling with PCFGs. We show that masked language models like BERT
or RoBERTa of moderate sizes can approximately execute the Inside-Outside
algorithm for the English PCFG [Marcus et al, 1993]. We also show that the
Inside-Outside algorithm is optimal for masked language modeling loss on the
PCFG-generated data. We also give a construction of transformers with $50$
layers, $15$ attention heads, and $1275$ dimensional embeddings in average such
that using its embeddings it is possible to do constituency parsing with
$>70\%$ F1 score on PTB dataset. We conduct probing experiments on models
pre-trained on PCFG-generated data to show that this not only allows recovery
of approximate parse tree, but also recovers marginal span probabilities
computed by the Inside-Outside algorithm, which suggests an implicit bias of
masked language modeling towards this algorithm.
论文链接:http://arxiv.org/pdf/2303.08117v1

原创文章,作者:fendouai,如若转载,请注明出处:https://panchuang.net/2023/03/15/%e5%9c%a8%e9%a2%84%e6%b5%8b%e6%8e%a9%e7%a0%81%e8%af%8d%e6%97%b6%ef%bc%8c%e5%8f%98%e5%bd%a2%e9%87%91%e5%88%9a%e6%98%af%e5%90%a6%e4%bc%9a%e8%a7%a3%e6%9e%90%ef%bc%9f/

联系我们

400-800-8888

在线咨询:点击这里给我发消息

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息