My Publications

The Titans at SemEval-2019 Task 5: Detection of hate speech against immigrants and women in Twitter

Proceedings of the 13th International Workshop on Semantic Evaluation 2019

Avishek Garain, Arpan Basu Date | June, 2019

Abstract

This system paper is a description of the system submitted to “SemEval-2019 Task 5” Task B for the English language, where we had to primarily detect hate speech and then detect aggressive behaviour and its target audience in Twitter. There were two specific target audiences, immigrants and women. The language of the tweets was English. We were required to first detect whether a tweet is containing hate speech. Thereafter we were required to find whether the tweet was showing aggressive behaviour, and then we had to find whether the targeted audience was an individual or a group of people.
Link to full paper

The Titans at SemEval-2019 Task 6: Offensive Language Identification, Categorization and Target Identification

Proceedings of the 13th International Workshop on Semantic Evaluation 2019

Avishek Garain, Arpan Basu Date | June, 2019

Abstract

This system paper is a description of the system submitted to “SemEval-2019 Task 6”, where we had to detect offensive language in Twitter. There were two specific target audiences, immigrants and women. The language of the tweets was English. We were required to first detect whether a tweet contains offensive content, and then we had to find out whether the tweet was targeted against some individual, group or other entity. Finally we were required to classify the targeted audience.
Link to full paper

JUMT at WMT2019 News Translation Task: A Hybrid Approach to Machine Translation for Lithuanian to English

Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1) 2019

Sainik Kumar Mahata, Avishek Garain, Adityar Rayala, Dipankar Das, Sivaji Bandyopadhyay Date | August, 2019

Abstract

In the current work, we present a description of the system submitted to WMT 2019 News Translation Shared task. The system was created to translate news text from Lithuanian to English. To accomplish the given task, our system used a Word Embedding based Neural Machine Translation model to post edit the outputs generated by a Statistical Machine Translation model. The current paper documents the architecture of our model, descriptions of the various modules and the results produced using the same. Our system garnered a BLEU score of 17.6.
Link to full paper

Sentiment analysis at SEPLN (TASS)-2019: Sentiment analysis at tweet level using deep learning

CEUR Workshop Proceedings

Avishek Garain, Sainik Kumar Mahata Date | 1 August, 2019

Abstract

This paper describes the system submitted to ”Sentiment Analysis at SEPLN (TASS)-2019” shared task. The task includes sentiment analysis of Spanish tweets, where the tweets are in different dialects spoken in Spain, Peru, Costa Rica, Uruguay and Mexico. The tweets are short (up to 240 characters) and the language is informal, i.e., it contains misspellings, emojis, onomatopeias etc. Sentiment analysis includes classification of the tweets into 4 classes, viz., Positive, Negative, Neutral and None. For preparing the proposed system, we use Deep Learning networks like LSTMs.
Link to full paper

Humor analysis based on human annotation (HAHA)-2019: Humor analysis at tweet level using deep learning

CEUR Workshop Proceedings

Avishek Garain Date | 24 September, 2019

Abstract

This paper is a description of the system submitted to ”Humor Analysis based on Human Annotation(HAHA)-2019” shared task. The task is divided into two sub-tasks which includes detection of humour in Spanish tweets and predicting a Humor score for the same. The tweets are short (up to 240 characters) and the language is informal, i.e., it contains spelling mistakes, emojis, emoticons, onomatopeias etc. Humor detection includes classification of the tweets into 2 classes, viz., Humorous, Not humorous. For preparing the proposed system, I use Deep Learning networks like LSTMs.
Link to full paper

Word Difficulty Prediction Using Convolutional Neural Networks

TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON)

Arpan Basu, Avishek Garain, Sudip Kumar Naskar Date | 12 December, 2019

Abstract

Most text-simplification systems require an indicator of the complexity of the words. The prevalent approaches to word difficulty prediction are based on manual feature engineering. Using deep learning based models are largely left unexplored due to their comparatively poor performance. In this paper we explore the use of one of such in predicting the difficulty of words. We treat the problem as a binary classification problem. We train traditional machine learning models and evaluate their performance on the task. Removing dependency on frequency of previously acquired words for measuring difficulty was one of our primary aims. Then we analyze a convolutional neural network based prediction model which operates at the character level and evaluate its efficiency compared to others.
Link to full paper

Sentence Simplification using Syntactic Parse trees

4th International Conference on Information Systems and Computer Networks (ISCON) 2019

Avishek Garain, Arpan Basu, Rudrajit Dawn, Sudip Kumar Naskar Date | 16 March, 2020

Abstract

Text simplification is one of the domains in Natural Language Processing which offers great promise for exploration. Simplifying sentences offer better results, as compared dealing with complex/compound sentences, in many language processing applications as well. Recently, Neural Networks have been used in simplifying texts, be it by state of the art LSTM’s and GRU cells or by Reinforcement learning models. In contrast, in this work, we present a classical approach consisting of two separate algorithms, for simplification of complex and compound sentences to their corresponding simple forms.
Link to full paper

JU at HASOC 2020: Deep Learning with RoBERTa and Random Forest for Hate Speech and Offensive Content Identification in Indo-European Languages

FIRE (Working Notes), CEUR

Biswarup Ray, Avishek Garain Date | March, 2020

Abstract

The identification of Hate Speech in Social Media has received much attention in research recently. There has been an ever-growing increase in demand particularly for research in languages other than English. The Hate Speech and Offensive Content (HASOC) track has created resources for Hate Speech Identification in three different languages namely Hindi, German, and English. We have participated in both Sub-tasks A and B of the 2020 shared task on hate speech and offensive content identification in Indo-European languages. Our approach relies on a combined model of multilingual RoBERTa (a Robustly Optimized BERT Pretraining Approach) model with pre-trained vectors and a Random Forest model using Word2Vec, TF-IDF, and other textual features as input. Our system has achieved a maximum Macro F1-score of 50.28% for English Sub-task A which is quite satisfactory relative to the performance of other systems and secured 8th position among participating teams.
Link to full paper

K-RMS Algorithm

Procedia Computer Science

Avishek Garain, Dipankar Das Date | 16 April, 2020

Abstract

Clustering is an unsupervised learning problem in the domain of machine learning and data science, where information about data instances may or may not be given. K-Means algorithm is one such clustering algorithms, the use of which is widespread. But, at the same time K-Means suffers from a few disadvantages such as low accuracy and high number of iterations. In order to rectify such problems, a modified K-Means algorithm has been demonstrated, named as K-RMS clustering algorithm in the present work. The modifications have been done so that the accuracy increases albeit with less number of iterations and specially performs well for decimal data compared to K-Means. The modified algorithm has been tested on 12 datasets obtained from UCI web archive, and the results gathered are very promising.
Link to full paper

Normalization of Numeronyms using NLP Techniques

2020 IEEE Calcutta Conference (CALCON)

Avishek Garain, Sainik Kumar Mahata, Subhabrata Dutta Date | 02 June, 2020

Abstract

This paper presents a method to apply Natural Language Processing for normalizing numeronyms to make them understandable by humans. We try to deal with the problem using two approaches, viz., semi-supervised approach and supervised approach. For the semi-supervised approach, we make use of the state of the art DamerauLevenshtein distance of words. We then apply Cosine Similarity for selection of the normalized text and reach greater accuracy in solving the problem. For the supervised approach, we used a deep learning architecture to solve the problem at hand. Our approach garners accuracy figures of 71% and 72% for Bengali and English (for the semisupervised approach) and 89% for the supervised approach, respectively.
Link to full paper

Factuality Classification Using BERT Embeddings and Support Vector Machines

Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020)

Biswarup Ray, Avishek Garain Date | September, 2020

Abstract

For any topic, its factuality can be defined as the category that determines the status of events with certainty of presentation of them. The first edition of the FACT task mainly focused on determination of the factuality of verb based events. The present edition is aimed at identifying noun based events and determine the factuality of all events be it verbs or nouns. We have participated in Subtask-1 of FACT 2020 task which is to automatically propose a factual tag for each event in the text. In this paper we have presented a method which extracts various features like BERT embeddings, Word2Vec embeddings and TF-IDF (Term Frequency-Inverse Document Frequency) scores of commonly recurring words, along with other manually extracted features as input features and passes them through a SVM (Support Vector Machine) classifier for classification purposes. Our system has achieved a f1-score of 36.6% and accuracy of 59.9% which is quite satisfactory relative to performance of other systems.
Link to full paper

HP_DocPres: a method for classifying printed and handwritten texts in doctor’s prescription

Multimedia Tools and Applications (Springer Link)

Dibyajyoti Dhar, Avishek Garain, Pawan Kumar Singh, Ram Sarkar Date | November, 2020

Abstract

Optical Character Recognition (OCR) system is used to convert the document images, either printed or handwritten, into its electronic counterpart. But dealing with handwritten texts is much more challenging than printed ones due to the erratic writing style of the individuals. The problem becomes more severe when the input image is a doctor’s prescription. Before feeding such an image to the OCR engine, the classification of printed and handwritten texts is a necessity as a doctor’s prescription contains both handwritten and printed texts which are to be processed separately. Much work has been done in the domain of handwritten and printed text separation albeit work related to doctor’s handwriting. In this paper, a method is proposed which first localizes the position of texts in a doctor’s prescription, and then separates out the printed texts from the handwritten ones. Due to the unavailability of a large database, we have used some standard data (image) augmentation techniques to evaluate as well as to prove the robustness of our method. Besides, we have also designed a Graphical User Interface (GUI) so that anybody can visualize the output by providing a prescription image as input.
Link to full paper

An ensemble-based hotel recommender system using sentiment analysis and aspect categorization of hotel reviews

Applied Soft Computing (Elsevier)

Biswarup Ray, Avishek Garain, Ram Sarkar Date | November, 2020

Abstract

Finding a suitable hotel based on user’s need and affordability is a complex decision-making process. Nowadays, the availability of an ample amount of online reviews made by the customers helps us in this regard. This very fact gives us a promising research direction in the field of tourism called hotel recommendation system which also helps in improving the information processing of consumers. Real-world reviews may showcase different sentiments of the customers towards a hotel and each review can be categorized based on different aspects such as cleanliness, value, service, etc. Keeping these facts in mind, in the present work, we have proposed a hotel recommendation system using Sentiment Analysis of the hotel reviews, and aspect-based review categorization which works on the queries given by a user. Furthermore, we have provided a new rich and diverse dataset of online hotel reviews crawled from Tripadvisor.com. We have followed a systematic approach which first uses an ensemble of a binary classification called Bidirectional Encoder Representations from Transformers (BERT) model with three phases for positive–negative, neutral–negative, neutral–positive sentiments merged using a weight assigning protocol. We have then fed these pre-trained word embeddings generated by the BERT models along with other different textual features such as word vectors generated by Word2vec, TF–IDF of frequent words, subjectivity score, etc. to a Random Forest classifier. After that, we have also grouped the reviews into different categories using an approach that involves fuzzy logic and cosine similarity. Finally, we have created a recommender system by the aforementioned frameworks. Our model has achieved a Macro F1-score of 84% and test accuracy of 92.36% in the classification of sentiment polarities. Also, the results of the categorized reviews have formed compact clusters. The results are quite promising and much better compared to state-of-the-art models.
Link to full paper

Garain at SemEval-2020 Task 12: Sequence Based Deep Learning for Categorizing Offensive Language in Social Media

Proceedings of the Fourteenth Workshop on Semantic Evaluation (ACL anthology)

Avishek Garain Date | December, 2020

Abstract

SemEval-2020 Task 12 was OffenseEval: Multilingual Offensive Language Identification in Social Media (Zampieri et al., 2020). The task was subdivided into multiple languages anddatasets were provided for each one. The task was further divided into three sub-tasks: offensivelanguage identification, automatic categorization of offense types, and offense target identification.I participated in the task-C, that is, offense target identification. For preparing the proposed system,I made use of Deep Learning networks like LSTMs and frameworks like Keras which combine thebag of words model with automatically generated sequence based features and manually extractedfeatures from the given dataset. My system on training on 25% of the whole dataset achieves macro averaged f1 score of 47.763%.
Link to full paper

JUNLP at SemEval-2020 Task 9: Sentiment Analysis of Hindi-English Code Mixed Data Using Grid Search Cross Validation

Proceedings of the Fourteenth Workshop on Semantic Evaluation (ACL anthology)

Avishek Garain, Sainik Mahata, Dipankar Das Date | December, 2020

Abstract

Code-mixing is a phenomenon which arises mainly in multilingual societies. Multilingual people, who are well versed in their native languages and also English speakers, tend to code-mix using English-based phonetic typing and the insertion of anglicisms in their main language. This linguistic phenomenon poses a great challenge to conventional NLP domains such as Sentiment Analysis, Machine Translation, and Text Summarization, to name a few. In this work, we focus on working out a plausible solution to the domain of Code-Mixed Sentiment Analysis. This work was done as participation in the SemEval-2020 Sentimix Task, where we focused on the sentiment analysis of English-Hindi code-mixed sentences. our username for the submission was “sainik.mahata” and team name was “JUNLP”. We used feature extraction algorithms in conjunction with traditional machine learning algorithms such as SVR and Grid Search in an attempt to solve the task. Our approach garnered an f1-score of 66.2% when tested using metrics prepared by the organizers of the task.
Link to full paper

FuzzyGCP: A deep learning architecture for automatic spoken language identification from speech signals

Expert Systems with Applications (Pergamon Elsevier)

Avishek Garain, Pawan Kumar Singh, Ram Sarkar Date | April, 2021

Abstract

In this modern era, language has no geographic boundary. Therefore, for developing an automated system for search engines using audio, tele-medicine, emergency service via phone etc., the first and foremost requirement is to identify the language. The fundamental difficulty of automatic speech recognition is that the speech signals vary significantly due to different speakers, speech variation, language variation, age and sex wise voice modulation variation, contents and acoustic conditions and so on. In this paper, we have proposed a deep learning based ensemble architecture, called FuzzyGCP, for spoken language identification from speech signals. This architecture combines the classification principles of a Deep Dumb Multi Layer Perceptron (DDMLP), Deep Convolutional Neural Network (DCNN) and Semi-supervised Generative Adversarial Network (SSGAN) to increase the precision to maximum and finally applies Ensemble learning using Choquet integral to predict the final output, i.e., the language class. We have evaluated our model on four standard benchmark datasets comprising of two Indic language datasets and two foreign language datasets. Irrespective of the languages, the F1-score of the proposed language identification model is as high as 98% in MaSS dataset and worst performance is that of 67% on the VoxForge dataset which is much better compared to maximum of 44% by state-of-the-art models on multi-class classification. The link to the source code of our model is available here.
Link to full paper

Detection of COVID-19 from CT scan images: A spiking neural network-based approach

Neural Computing and Applications

Avishek Garain, Arpan Basu, Fabio Giampaolo, Juan D. Velasquez and Ram Sarkar Date | April, 2021

Abstract

The outbreak of a global pandemic called coronavirus has created unprecedented circumstances resulting into a large number of deaths and risk of community spreading throughout the world. Desperate times have called for desperate measures to detect the disease at an early stage via various medically proven methods like chest computed tomography (CT) scan, chest X-Ray, etc., in order to prevent the virus from spreading across the community. Developing deep learning models for analysing these kinds of radiological images is a well-known methodology in the domain of computer based medical image analysis. However, doing the same by mimicking the biological models and leveraging the newly developed neuromorphic computing chips might be more economical. These chips have been shown to be more powerful and are more efficient than conventional central and graphics processing units. Additionally, these chips facilitate the implementation of spiking neural networks (SNNs) in real-world scenarios. To this end, in this work, we have tried to simulate the SNNs using various deep learning libraries. We have applied them for the classification of chest CT scan images into COVID and non-COVID classes. Our approach has achieved very high F1 score of 0.99 for the potential-based model and outperforms many state-of-the-art models. The working code associated with our present work can be found here.
Link to full paper

JUNLP@DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Langauges

Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages (ACL anthology)

Avishek Garain, Atanu Mandal, Sudip Kumar Naskar Date | April, 2021

Abstract

Offensive language identification has been an active area of research in natural language processing. With the emergence of multiple social media platforms offensive language identification has emerged as a need of the hour. Traditional offensive language identification models fail to deliver acceptable results as social media contents are largely in multilingual and are code-mixed in nature. This paper tries to resolve this problem by using IndicBERT and BERT architectures, to facilitate identification of offensive languages for Kannada-English, Malayalam-English, and Tamil-English code-mixed language pairs extracted from social media. The presented approach when evaluated on the test corpus provided precision, recall, and F1 score for language pair Kannada-English as 0.62, 0.71, and 0.66, respectively, for language pair Malayalam-English as 0.77, 0.43, and 0.53, respectively, and for Tamil-English as 0.71, 0.74, and 0.72, respectively.
Link to full paper

GRANet: A Deep Learning Model for Classification of Age and Gender from Facial Images

IEEE Access

Avishek Garain, Biswarup Ray, Pawan Kumar Singh, Ali Ahmadian, Norazak Senu, Ram Sarkar Date | May, 2021

Abstract

The problem of gender and age identification has been addressed by many researchers, however, the attention given to it compared to the other related problems of face recognition in particular is far less. The success achieved in this domain has not seen much improvement compared to the other face recognition problems. Any language in the world has a separate set of words and grammatical rules when addressing people of different ages. The decision associated with its usage, relies on our ability to demarcate these individual characteristics like gender and age from the facial appearances at one glance. With the rapid usage of Artificial Intelligence (AI) based systems in different fields, we expect that such decision making capability of these systems match as much as to the human capability. To this end, in this work, we have designed a deep learning based model, called GRANet (Gated Residual Attention Network), for the prediction of age and gender from the facial images. This is a modified and improved version of Residual Attention Network where we have included the concept of Gate in the architecture. Gender identification is a binary classification problem whereas prediction of age is a regression problem. We have decomposed this regression problem into a combination of classification and regression problems for achieving better accuracy. Experiments have been done on five publicly available standard datasets namely FG-Net, Wikipedia, AFAD, UTKFAce and AdienceDB. Obtained results have proven its effectiveness for both age and gender classification, thus making it a proper candidate for the same against any other stateof-the-art methods.
Link to full paper

JUMRv1: a sentiment analysis dataset for movie recommendation

Applied Sciences

Shuvamoy Chatterjee, Kushal Chakrabarti, Avishek Garain, Friedhelm Schwenker, Ram Sarkar Date | Jan, 2021

Abstract

Nowadays, we can observe the applications of machine learning in every field, ranging from the quality testing of materials to the building of powerful computer vision tools. One such recent application is the recommendation system, which is a method that suggests products to users based on their preferences. In this paper, our focus is on a specific recommendation system called movie recommendation. Here, we make use of user reviews of movies in order to establish a general outlook about the movie and then use that outlook to recommend that movie to other users. However, a huge number of available reviews has baffled sophisticated review systems. Consequently, there is a need to find a method of extracting meaningful information from the available reviews and use that in classifying a movie review and predicting the sentiment in each one. In a typical scenario, a review can either be positive, negative, or indifferent about a movie. However, the available research articles in the field mainly consider this as a two-class classification problem—positive and negative. The most popular work in this field was performed on Stanford and Rotten Tomatoes datasets, which are somewhat outdated. Our work is based on self-scraped reviews from the IMDB website, and we have annotated the reviews into one of the three classes—positive, negative, and neutral. Our dataset is called JUMRv1—Jadavpur University Movie Recommendation dataset version 1. For the evaluation of JUMRv1, we took an exhaustive approach by testing various combinations of word embeddings, feature selection methods, and classifiers. We also analysed the performance trends, if there were any, and attempted to explain them. Our work sets a benchmark for movie recommendation systems that is based on the newly developed dataset using a three-class sentiment classification.
Link to full paper

Preparing Annotated Data on Covid -19 by Employing Naïve Bayes

Conference: 2nd International Conference on Machine Learning, IOT and Blockchain (MLIOB 2021)

Sourav Kumar, Avishek Garain, Akash Ghosh, AdityaR Rayala, Dibyajyoti Dhar, Vidit Sarkar, Dipankar Das Date | Aug, 2021

Abstract

The on-going pandemic has opened the pandora’s box of the plethora of hidden problems which the society has been hiding for years. But the positive side to the present scenario is the opening up of opportunities to solve these problems on the global stage. One such area which was being flooded with all kinds of different emotions, and reaction from the people all over the world, is twitter, which is a micro blogging platform. Coronavirus related hash tags have been trending all over for many days unlikeany other event in the past. Our experiment mainly deals with the collection, tagging and classification of these tweets based on the different keywords that they may belong to, using the Naive Bayes algorithm at the core.
Link to full paper

Simplification of English and Bengali Sentences for Improving Quality of Machine Translation

Neural Processing Letters

Sainik Kumar Mahata, Avishek Garain, Dipankar Das, Sivaji Bandyopadhyay Date | Feb, 2022

Abstract

Training translation systems with complex and compound sentences are generally considered computationally tough and such systems fail to process, the large syntactical information given out by these sentences. This issue subsequently, affects the overall quality of translations. On the other hand, simple sentences are shorter by nature and produce less syntactical information. Therefore, it would be safe to say that training translation systems using simple sentences only, would result in better translation output. However, training a translation system requires a large and quality parallel corpus involving two natural languages. While parallel corpus for various language pairs is abundant, such lexicons for low-resourced languages, consisting of only simple sentences are rare. In such a scenario, the development of such a parallel lexicon is the initial purpose of the present work. Building the same would require differentiating complex and/or compound sentences from the overall corpus and then converting them into simple sentences. Since, the work includes two languages, English and Bengali, different algorithms to accomplish the same, is documented in this paper. Converting complex and compound sentences to simple instances results in fragmenting sentences into two or more segments, which then needs to be aligned to make them semantically similar. Hence, a basic alignment technique has also been proposed to mitigate this problem. After developing the parallel corpus, we needed to check for its effectiveness in solving the quality issues of translation systems discussed earlier. For this, state-of-the-art translation modules like Statistical Machine Translation and Neural Machine Translation, have been trained using the developed corpus as well as with the raw parallel corpus consisting of sentences of mixed complexities. The performance of these translation models has been compared using automated as well as manual evaluation metrics. The results are promising and prove that translation systems do perform better when trained using simple sentence language pairs.
Link to full paper

GRaNN: feature selection with golden ratio-aided neural network for emotion, gender and speaker identification from voice signals

Neural Computing and Applications

Avishek Garain, Biswarup Ray, Fabio Giampaolo, Juan D. Velasquez, Pawan Kumar Singh, Ram Sarkar Date | May, 2022

Abstract

Compared to other features of the human body, voice is quite complex and dynamic, in a sense that a speech can be spoken in various languages with different accents and in different emotional states. Recognizing the gender, i.e. male or female from the voice of an individual, is by all accounts a minor errand for human beings. Similar goes for speaker identification if we are well accustomed with the speaker for a long time. Our ears function as the front end, accepting the sound signs which our cerebrum processes and settles on our disposition. Although being trivial for us, it becomes a challenging task to mimic for any computing device. Automatic gender, emotion and speaker identification systems have many applications in surveillance, multimedia technology, robotics and social media. In this paper, we propose a Golden Ratio-aided Neural Network (GRaNN) architecture for the said purposes. As deciding the number of units for each layer in deep NN is a challenging issue, we have done this using the concept of Golden Ratio. Prior to that, an optimal subset of features are selected from the feature vector extracted, common for all three tasks, from spectral images obtained from the input voice signals. We have used a wrapper-filter framework where minimum redundancy maximum relevance selected features are fed to Mayfly algorithm combined with adaptive beta hill climbing (AβHC) algorithm. Our model achieves accuracies of 99.306% and 95.68% for gender identification in RAVDESS and Voice Gender datasets, 95.27% for emotion identification in RAVDESS dataset and 67.172% for speaker identification in RAVDESS dataset. Performance comparison of this model with existing models on the publicly available datasets confirms its superiority over those models. Results also ensure that we have chosen the common feature set meticulously, which works equally well on three different pattern classification tasks. The proposed wrapper-filter framework reduces the feature dimension significantly, thereby lessening the storage requirement and training time. Finally, strategically selecting the number units in each layer in NN help increases the overall performance of all three pattern classification tasks.
Link to full paper

Differentially private human activity recognition for smartphone users

Multimedia Tools and Applications

Avishek Garain, Rudrajit Dawn, Saswat Singh, Chandreyee Chowdhury Date | May, 2022

Abstract

User privacy is an important concern that should be handled in data intensive applications. Interestingly, differential privacy is a privacy model that can be applied to such datasets. This model is advantageous as it does not make any strong assumption about the adversary. In this work, we have introduced the notion of differential privacy in the domain of Human Activity Recognition (HAR). Real life accelerometer data has been collected from different smartphone configurations that were carried by the users in different manner according to their convenience. Our contribution in this work is to propose a privacy preserving HAR framework incorporating algorithms to preserve the differential privacy of the user data. The algorithm exploits the scalar and the vector parts of the accelerometer readings and applies privacy preserving mechanisms on it. A Deep Multi Layer Perceptron (DMLP) framework has been utilized for activity classification. We have achieved comparatively similar results with an enhanced surplus of achievement of privacy in terms of data and are so far the first of its kind in the aforementioned domain of HAR based on smartphone sensing data. The proposed framework is implemented both on collected real life dataset capturing different smartphone configurations and usage behavior and benchmark datasets.
Link to full paper

Subscribe

Subscribe to the Newsletter to get updates

Avishek Garain Date | 22 August, 2020

I will send over a mail whenever I add new content :)

Who is this Garain ?

A brief introduction to Avishek Garain

Avishek Garain Date | 22 August, 2020

Avishek Garain is an Undergrad at Jadavpur University in the department of Computer Science and Engineering. His interest lies primarily in Deep Learning for Natural Language Processing and Computer Vision. He follows up several research journals to keep himself up-to-date with current research and also actively takes part in research at the undergraduate level under Dr. Sudip Kumar Naskar, Dr. Dipankar Das and Dr. Ram Sarkar Dept. of CSE, Jadavpur University.

He loves to teach people about his experience on Deep Learning and enthusiastically invites research collaborations and internship opportunities in the field of Deep Learning for Natural Language Processing and Computer Vision.

The Titans at SemEval-2019 Task 5: Detection of hate speech against immigrants and women in Twitter

Abstract

The Titans at SemEval-2019 Task 6: Offensive Language Identification, Categorization and Target Identification

Abstract

JUMT at WMT2019 News Translation Task: A Hybrid Approach to Machine Translation for Lithuanian to English

Abstract

Sentiment analysis at SEPLN (TASS)-2019: Sentiment analysis at tweet level using deep learning

Abstract

Humor analysis based on human annotation (HAHA)-2019: Humor analysis at tweet level using deep learning

Abstract

Word Difficulty Prediction Using Convolutional Neural Networks

Abstract

Sentence Simplification using Syntactic Parse trees

Abstract

JU at HASOC 2020: Deep Learning with RoBERTa and Random Forest for Hate Speech and Offensive Content Identification in Indo-European Languages

Abstract

K-RMS Algorithm

Abstract

Normalization of Numeronyms using NLP Techniques

Abstract

Factuality Classification Using BERT Embeddings and Support Vector Machines

Abstract

HP_DocPres: a method for classifying printed and handwritten texts in doctor’s prescription

Abstract

An ensemble-based hotel recommender system using sentiment analysis and aspect categorization of hotel reviews

Abstract

Garain at SemEval-2020 Task 12: Sequence Based Deep Learning for Categorizing Offensive Language in Social Media

Abstract

JUNLP at SemEval-2020 Task 9: Sentiment Analysis of Hindi-English Code Mixed Data Using Grid Search Cross Validation

Abstract

FuzzyGCP: A deep learning architecture for automatic spoken language identification from speech signals

Abstract

Detection of COVID-19 from CT scan images: A spiking neural network-based approach

Abstract

JUNLP@DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Langauges

Abstract

GRANet: A Deep Learning Model for Classification of Age and Gender from Facial Images

Abstract

JUMRv1: a sentiment analysis dataset for movie recommendation

Abstract

Preparing Annotated Data on Covid -19 by Employing Naïve Bayes

Abstract

Simplification of English and Bengali Sentences for Improving Quality of Machine Translation

Abstract

GRaNN: feature selection with golden ratio-aided neural network for emotion, gender and speaker identification from voice signals

Abstract

Differentially private human activity recognition for smartphone users

Abstract

Subscribe

Who is this Garain ?

For more details visit his personal website: Click here!