Sign In

Communications of the ACM

ACM TechNews

Finding the Needles in a Haystack of High-Dimensional Datasets

View as: Print Mobile App Share:
A needle in a haystack.

Computer scientists at the University of Groningen developed an algorithm that can select subsets of features of high-dimensional datasets that are relevant and have high predictive powers.

Credit: Bigtock

An algorithm developed by computer scientists at the University of Groningen in the Netherlands enables the smallest, most relevant subset of features to be selected from high-dimensional datasets.

The "FeatBoost" algorithm allows for faster, more scalable analysis, more affordable data acquisition and storage, and improved explainability in the interaction between the selected features.

Said the University of Groningen's Ahmad Alsahaf, "We use a decision tree-based model to select the most relevant features. We subsequently create and evaluate a classification model using the selected features so far. Any samples that are wrongly classified will be given more emphasis in determining the next set of most relevant features, a process called boosting. These steps are repeated until the performance of the classification model cannot improve any further."

From University of Groningen (Netherlands)
View Full Article


Abstracts Copyright © 2021 SmithBucklin, Washington, DC, USA


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account