Graph-based approach for feature selection in high-dimensional data sets


Posted on January 1, 2017 at 12:00 PM


Objective

To use graph-theoretic principles for selecting a subset of features of a data set having very large number of features.

Approach

Map the features of a data set as a graph, either based on similarity between features or based on information contribution of the features. This gives a unique visualization of the feature relevance or redundancy. Then eliminate irrelevant features and use graph –theoretic principles of drawing sub-graphs to derive a subset of the potentially redundant features.

Current Status

The initial formulation of the graph-theoretic representation of the features has been done based on similarity and information contribution. Those have been named as Feature Association Map (FAM) and Feature Information Map (FIM) respectively.

Next Step

Next step is to implement the concept of fuzzy sets in deciding similarity of features instead of assuming a hard value of similarity threshold. Also, use the concept of rough sets to derive a single subset from multiple potentially optimal subsets. Finally, the algorithm will be tested with a very high dimensional data set.

Team Members
  • Mr. Amit Kumar Das
  • Dr. Saptarsi Goswami
  • Dr. Amlan Chakrabarti
  • Dr. Basabi Chakrabarti