Poster Abstract

P7.5 Yann Bisch (Observatoire astronomique de Strasbourg - CDS)

Theme: Data discovery across heterogeneous datasets

A new textual search engine to discover VizieR catalogues

Authors: Bisch.Y, Landais.G, Schaaff.A
Textual search is a part of the VizieR indexation which completes the position indexation and the keyword search resulting from SQL queries. This new capability extends the capacity of the current engine with a natural language approach like in google, or bumblebee (ADS). The new version – still in beta – uses the Elasticsearch engine, an Open Source Software that indexes documents with a grammar and a textual search analyser.

The query supported is a NO-SQL language including strict or fuzzy search and available through an HTTP RESTful API. Then, a fine configuration adapted to the different data (authors, DOI, abstract, date...) is needed to improve the indexation.

Resources indexed are the ReadMes which describe the VizieR catalogues. A ReadMe is a structured ASCII file containing the basis metadata : authors, title, keywords, abstracts but also tables and column descriptions. We will explain the new implementation from the data origin to the final users.