ENGLISH

NVIDIA’s AI advance: Natural language processing gets faster and better all the time

137

When NVIDIA announced breakthroughs in language understanding to enable real-time conversational AI, we were caught off guard. We were still trying to digest the proceedings of ACL, one of the biggest research events for computational linguistics worldwide, in which Facebook, Salesforce, Microsoft and Amazon were all present.

While these represent two different sets of achievements, they are still closely connected. Here is what NVIDIA’s breakthrough is about, and what it means for the world at large.

NVIDIA does BERT

As ZDNet reported yesterday, NVIDIA says its AI platform now has the fastest training record, the fastest inference, and largest training model of its kind to date. NVIDIA has managed to train a large BERT model in 53 minutes, and to have other BERT models produce results in 2.2 milliseconds. But we need to put that into context to understand its significance.

BERT (Bidirectional Encoder Representations from Transformers) is research (paper, open source code and datasets) published by researchers at Google AI Language in late 2018. BERT has been among a number of breakthroughs in natural language processing recently, and has caused a stir in the AI community by presenting state-of-the-art results in a wide variety of natural language processing tasks.

What NVIDIA did was to work with the datasets Google released (two flavors, BERT-Large and BERT-Base) and its own GPUs to slash the time needed to train the BERT machine learning model and then use it in applications. This is how machine learning works — first there is a training phase, in which the model learns by being shown lots of data, and then an inference phase, in which the model processes new data.

NVIDIA used different configurations, producing different results for this. It took the NVIDIA DGX SuperPOD using 92 NVIDIA DGX-2H systems running 1,472 NVIDIA V100 GPUs to train a BERT model on BERT-Large, while the same task took one NVIDIA DGX-2 system 2.8 days. The 2.2 millisecond inference result is on a different system/dataset (NVIDIA T4 GPUs running NVIDIA TensorRT / BERT-Base).

The bottom line is that NVIDIA has helped boost BERT training — compared to what used to be the norm for this — by several days. But the magic here was a combination of hardware and software, and this is why NVIDIA is releasing its own tweaks to BERT, which may be the biggest win for the community at large.

NVIDIA does BERT

Related Topics: