Services for the European Open Science Cloud

CLARIN’s Virtual Language Observatory

Twan Goosen writes about an open digital tool for the Social Sciences and Humanities

The Virtual Language Observatory (VLO) is a service provided by CLARIN offering uniform search and discovery functionality for language resources and tools. The metadata indexed is heterogeneous in terms of content and structure. This metadata is sourced regularly from over fourty CLARIN centres that provide resources or tools of interest to scholars with an interest in language data. On top of that, CLARIN harvests from a few dozen external providers that are also providing relevant content.

The VLO has about 850 thousand metadata records that can be searched, browsed and viewed. The addition of another 780 thousand records describing cultural heritage objects from Europeana is currently under evaluation.

The VLO is openly accessible via the web to anyone. Researchers can freely enter a search term and/or use a number of pre-defined facets to refine the search results. This method of faceted browsing is as easy as using an online store and allows for quick filtering on basis of object language, nature of the resource, subject or organisations involved.

The VLO is part of CLARIN’s Component Metadata Infrastructure and can cope with many different metadata descriptions, as long as they are implemented through (or converted to) the Component Metadata framework. Component Metadata ‘profiles’ can be defined to contain any number of fields and (sub)components and allows for semantic annotation of all of these, through which interoperability across profiles is achieved. This principle is exploited by the VLO, which maps a wide range of metadata fields to the relatively small set of fields underlying its search and browsing facilities on basis of explicit semantics.

While the VLO has been optimised for language resources and tools, it is not tied to any specific domain. Other fields, especially those that face a strong diversity in terms of metadata formats, could benefit from adopting the component metadata approach. During the EOSC-hub project, CLARIN will work on making the VLO more generically applicable so that any community can adopt it with minimum effort.

More information

Go to vlo.clarin.eu and press ‘Take a quick tour’ see the VLO’s main features demonstrated. You can learn more about CMDI at clarin.eu/cmdi.


Twan Goosen is a developer at CLARIN