[논문 리뷰] Taking Notes on the Fly Helps Language Pre-Training (TNF)

1 분 소요

Information

Task: Language Modeling
Publisher: ICLR
Year: 2021

Abstract

  1. Creating a note dict for rare words in training corpus makes the pre-training faster and stable
  2. Introducing note embedding to update and maintain the note dict

Method

  1. Contruction of note dictionary
    0
    The left box shows the forward pass with the help of the note dictionary. In the input word sequence, w2w2 is a rare word. Then for tokens 4 and 5 originated from w2w2, we query the value of w2w2 in the note dictionary and weighted average it with token/position embeddings. The right box demonstrates how we maintain the note dictionary. After the forward pass of the model, we can get the contextual representations of the word near w2w2 and use mean pooling over those representations as the note of w2w2 in the current sentence. Then, we update w2w2’s value in the note dictionary by a weighted average of the current note and its previous value.

  2. Maintaining of note dictionary
    For a rare word ww that appears both in the input token sequence x=x1,,xi,,xnx=x1,,xi,,xn and note dict, denoting the span boundary of ww in xx as (s,t)(s,t), where ss and tt are the starting and ending position. The note of ww for xx as Note(w,x)=12k+tst+kj=skcjNote(w,x)=12k+tst+kj=skcj where cjRd si the output of the encoder on position j and served as the contextual representation of xj, k is half of the window size that controls how many surrounding tokens we want to take as notes and save their semantics.

댓글남기기