Loading Events

« All Events

  • This event has passed.

Small Data: A Big Challenge for Classical Machine Learning

April 12, 2017 @ 7:00 pm - 10:00 pm

We live in a world inundated with data: websites, mobile devices, security systems, and even small wireless sensor systems constantly collect data. In fact, such systems often collect so much data that traditional data processing techniques are insufficient. This is the big data problem. However, sometimes things go the other way: there is a critical constriction in the size of the data at some point in the processing pipeline that prevents traditional machine learning techniques from working. We call this the Small Data problem. For all but the most simple classification and regression models, traditional machine learning requires a considerable amount of training data to learn a model without overfitting. Unfortunately, for many datasets, especially within medical applications, there is not enough training data for these models to work. There are a number of common reasons for this constriction including: plenty of healthy subjects/not enough sick subjects, per-person differences that require individualized training, and a high cost of features (such as certain tests or data that costs a lot of money/time to collect). As an example, collecting data from the following subjects is increasingly problematic as we proceed from right to left: rats, apes, healthy adults, sick adults, healthy children, and sick children. So, ideally, you would want to learn as much from subjects toward the left of the list and try to generalize the knowledge to subjects on the right. But this introduces another aspect of the small data problem: it’s not just the amount of data, but the quality. A model learned from rats might break down pretty quickly if applied to a sick child. But there may not be enough data from sick children to adequately train the full model. How do we solve these problems? Like terrible scientists, but great engineers, we cheat: we use as much pre-existing or inexpensively obtained knowledge to constrain the problems to the point where the model is simple enough to be correctly trained with the available data.

Speaker(s): Prof. Sara Ostadabbas,

Location:
Room: Conference Room, No. 306
Bldg: Eagan Research Center
Northeastern University, 120 Forsyth St
Building #60
Boston, Massachusetts

Details

Date:
April 12, 2017
Time:
7:00 pm - 10:00 pm
Website:
http://events.vtools.ieee.org/m/44536

Organizer

[email protected]
Social Media Auto Publish Powered By : XYZScripts.com