NLP multi-task learning case sharing: a hierarchical growth of neural network structure

Due to the powerful expressive power of neural networks, researchers in the field of NLP began to study multi-task learning based on neural networks. Most methods learn the association between tasks through network parameter sharing, and improve the effect of each task.

This article introduces a joint multi-task model that solves complex tasks by gradually deepening the number of layers.

Different from the traditional parallel multi-task learning, this paper learns the model based on the hierarchical relationship of tasks (POS->CHUNK->DEP->Related->Entailment). Each task has its own objective function, and finally achieved good results.

In the field of NLP, there is a correlation between tasks. Researchers use MulTIple-Task Learning to promote inter-task interactions and improve the performance of individual tasks. At present, the existing mainstream multi-tasking framework uses the same depth model to perform multi-task learning in parallel through parameter sharing, as shown in the following figure.

In the NLP field, there are often hierarchical relationships between tasks, such as from lexical analysis to syntactic analysis to upper-level practical application tasks (such as specific tasks: part of speech analysis POS-> chunk analysis CHUNK-> dependency syntax analysis DEP- >Text semantic related Relatedness->Text Implicationment).

Most of the existing multi-task learning models ignore the linguistic hierarchical relationship between NLP tasks. To solve this problem, the paper proposes a hierarchical growth neural network model, considering the linguistic hierarchical relationship between tasks.

Thesis method

The overall framework of the paper model is shown below. Compared with the traditional parallel multi-task learning model, the model framework is based on the linguistic hierarchical relationship, which superimposes different task stacks, and the more hierarchical tasks have deeper network structure. The current level of tasks will use the next level of task output.

At the word and syntax level, each task is modeled using a two-way LSTM. The semantic level, based on the representations learned from the previous level tasks, is classified using the softmax function. In the training phase, each task has its own corresponding objective function, using all the task training data, in accordance with the hierarchical order of the model from bottom to top, in turn, joint training.

In addition, in the specific implementation, each layer of bidirectional LSTM uses the word vector (Shortcut ConnecTIon) and the previous task label (Label Embedding). A sequential RegularizaTIon is added to the objective function of each task to make the model not forget the information that was learned before.

Paper experiment

Each task data set: POS (WSJ), CHUNK (WSJ), DEP (WSJ), Relatedness (SICK), Entailment (SICK).

Multitasking vs single task (on test set)

The paper gives the experimental results of multi-task and single-task (because some task data sets overlap, there is no result n/a), and the results of using all tasks and arbitrary tasks specifically. It can be seen that multitasking has improved on all tasks compared to single tasks.

Compare with mainstream methods (on the test set)

Comparing each specific task with the current mainstream method (including parallel multi-task learning methods), we can see that the results of each task of the paper can basically reach the current optimal results.

Model structure analysis (on the development set)

(1) shortcut connection, output label vector and cascade regularization effect

Lithium Battery CR14250

Lithium Battery Cr14250,Lithium Primary Cylindrical Battery,Laser Sight Lithium Battery,Laser Sight Battery Cr14250

Jiangmen Hongli Energy Co.ltd , https://www.honglienergy.com