[논문 리뷰] Taking Notes on the Fly Helps Language Pre-Training (TNF)
Information
Task: Language Modeling
Publisher: ICLR
Year: 2021
Abstract
- Creating a note dict for rare words in training corpus makes the pre-training faster and stable
- Introducing note embedding to update and maintain the note dict
Method
-
Contruction of note dictionary
The left box shows the forward pass with the help of the note dictionary. In the input word sequence, w2w2 is a rare word. Then for tokens 4 and 5 originated from w2w2, we query the value of w2w2 in the note dictionary and weighted average it with token/position embeddings. The right box demonstrates how we maintain the note dictionary. After the forward pass of the model, we can get the contextual representations of the word near w2w2 and use mean pooling over those representations as the note of w2w2 in the current sentence. Then, we update w2w2’s value in the note dictionary by a weighted average of the current note and its previous value. -
Maintaining of note dictionary
For a rare word ww that appears both in the input token sequence x=x1,…,xi,…,xnx=x1,…,xi,…,xn and note dict, denoting the span boundary of ww in xx as (s,t)(s,t), where ss and tt are the starting and ending position. The note of ww for xx as Note(w,x)=12k+t−st+k∑j=s−kcjNote(w,x)=12k+t−st+k∑j=s−kcj where cj∈Rd si the output of the encoder on position j and served as the contextual representation of xj, k is half of the window size that controls how many surrounding tokens we want to take as notes and save their semantics.
댓글남기기