Poster Abstract

P3.9 Petr Skoda (Faculty of Information Technology, Czech Technical University in Prague)

Theme: Data science challenges: tools from statistics to machine learning

VO-supported active deep learning as a new methodology for the discovery of objects of interest in

Petr Škoda, Ondřej Podsztavek and Pavel Tvrdík
Faculty of Information Technology of Czech Technical University in Prague

Deep neural networks have been proved as a very successful method of supervised learning in several research fields. To perform well, they require a massive amount of labelled data, which is challenging to get from most astronomical surveys, namely those dependent on a spectroscopic confirmation. To overcome this limitation, we present a novel active deep learning methodology.

It is based on an iterative training of deep networks followed by relabelling of a small sample of predicted target classes according to a qualified decision of an expert in the role of an oracle. To maximise the scientific return, the oracle brings to the decision the domain knowledge not limited only to the data learned by the network. By combining some external resources to extract the key information by an expert in a field, a much more relevant label is assigned, namely in the boundary cases of noisy or ambiguous data.

Setup of an active deep learning platform thus requires incorporation of a Virtual Observatory client infrastructure as an integral part of a machine learning experiment, which is quite different from current practices and brings challenges, mainly in the case of a Big Data. As the proof-of-concept, we show a methodology used for discovery of new emission line stars in multimillion spectra archive of LAMOST DR2 survey and discuss the potential architectures optimized for such tasks.