Text classification using hybrid method for imbalanced class


Posted on January 1, 2017 at 12:00 PM


Objective

There is a dataset of 10,000 tweets from important stock brokers, we want to identify the tweets which are relevant for stock market.

Approach

A mixed approach combining lexical based method with machine learning based methods. SMOTE is used for class imbalance problem, WordNet is used to grow the lexicons.

Result

F score has increased from 0.69 to 0.98

Current Status

We are on the process of submitting this in Journal.

Next Step

Identify Industry specific and stock specific actionable insights from the relevant stocks.

Team Members
  • Saptarsi Goswami (Faculty)
  • Mr. Asish Chowdhury (IT Professional with 15 years of experience)
  • Mr. Sourav Malakar (Alumnus)