Linguistic Resources (Level 1)

This course will be jointly organised by Lars Borin (Göteborg University) http://svenska.gu.se/~svelb/ and Daniel Hardt (Copenhagen Business School) http://www.id.cbs.dk/~dh/

The start of the class will be in Göteborg during the week beginning 13th September, 2004 followed by one meeting november 5th in Copenhagen and one in January 2005 (Stockholm)

Purpose

The purpose of this course is to provide a research-oriented introduction to linguistic resources, their uses and growing importance in the field of language technology.

Overview

The focus of the course will be on linguistic data resources, while linguistic algorithmic resources (a.k.a. tools) will be treated only incidentally, as needed to elucidate some aspect of the data resources. Thus delimited, linguistic resources basically come in three flavors:

  1. corpus resources (text corpora of written or spoken language, speech databases, digitized video, etc.);
  2. lexical resources;
  3. grammatical resources (these are the most difficult to treat separately from the tools for using them).

These resources are further, and orthogonally, defined by their

In reference to the resources, important general issues - of both theoretical and practical interest - are

Content

Topics to be covered will include:

Lexical Resources

Text Corpora and Markup

Syntax Treebanks

Discourse Treebanks

Parallel Corpora

Spoken Language Corpora/Speech Technology

------------------------------------------------------------------------