AI/ML

Fast Semantic Search Using Sentence BERT


Source

Are BERT and RoBERTa have computationally powerful for Semantically Textual Similarity?

Google BERT and other transformer-based models have shown the state of the art performance in numerous problems and open new frontiers in natural language processing. Later with the birth of Robustly Optimized BERT Pre-training Approach short for RoBERTa has presented a replicated study of BERT pre-training and addresses key impacts due to hyperparameters and training data sizes. They performed experiments over GLUE, RACE, and the SQUAD dataset that noticed the state of the art performance. Below are hyperparameter fine-tuning and pre-training RoBERTa.

Source: RoBERTa: A Robustly Optimized BERT Pretraining Approach (Yinhan Liu et.al)

The BERT and RoBERTa on sentence-pairs regression like semantic textual similarity have set a state of the art performance. However, they need both the sentences to feed into their network that arises massively computational overhead. Searching for most similar sentence pairs in the collections of 10,000 sentences requires around 50 million inference with BERT. Its construction is unsuitable for search as well as clustering problems. Later the Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks has presented at EMNLP 2019 by Nils Reimers and Iryna Gurevych. Like RoBERTa, Sentence-BERT is a fine-tuned a pre-trained BERT using the siamese and triplet network and add pooling to the output of the BERT to extract semantic similarity comparison within a vector space that can be compared using cosine similarity function.

Setup and Semantic Search

Source Code is available at GitHub and has a PyPI library for directly import it as a module. Some simple steps and you can play with Sentence-BERT. Here is the setup to build your semantic search.

Below is the Colab Link for Basic Semantic Search Implementation using Sentence-BERT.

Google Colaboratory

Evaluation

The evaluation has been done on common semantic textual similarity. Similarly, regression function works pairwise operation and If the sentence sizes become large enough, the scalability of the model has decreased due to combinatorial explosions. Use cosine-similarity to compare the similarity between two sentence embedding is preferable. The experiments have performed with negative Manhatten and negative Euclidean distances as similarity measures, but the results for all approaches remained roughly the same.

Results and Analysis

The following are the results and a detailed analysis of Sentence-BERT for semantic similarity tasks. This includes the comparison and ablation study along with Model performance or efficiency.

Results and Model Performace (Trained on V100 GPU)

Intending to share the experiences with AI developers’ that address a range of problems in designing and building complex algorithms to perform certain tasks. vision.ml is an open-source platform to share unuttered exciting content that we encounter while designing AI models. It doesn’t matter whether the problem is a simple one or too complicated. We had our share on topics that may be most of the developers/researchers are actively searching for. We try to cover topics such as machine learning, mathematics, NLP, to scripting languages such as Python, etc. We write what we addressed and more importantly how did we approach that problem. For more exciting content, please visit our platform, where you can also share your experiences.

Don’t forget to give us your 👏 !

https://medium.com/media/7078d8ad19192c4c53d3bf199468e4ab/href


Fast Semantic Search Using Sentence BERT was originally published in Chatbots Life on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source link

Related posts

Accurate real time localization tracking in a clinical environment using Bluetooth Low Energy and deep learning.

Newsemia

The discovery of novel predictive biomarkers and early-stage pathophysiology for the transition from gestational diabetes to type 2 diabetes.

Newsemia

RNAi Screening: Automated High-Throughput Liquid RNAi Screening in Caenorhabditis elegans.

Newsemia

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More

Privacy & Cookies Policy

COVID-19

COVID-19 (Coronavirus) is a new illness that is having a major effect on all businesses globally LIVE COVID-19 STATISTICS FOR World