Invited Speakers

Invited Speakers:

 

 

Prof. Paola Merlo, 

Computational Linguistics and Computational Learning Group Department of Linguistics and University Centre of Informatics, University of Geneva

Title
The Quest for Language Universals: Multi-lingual Computational Results and Methods


Abstract

The domain of language is distinguished by the complexity of the representations and the sophistication of the domain theory that is available. It also has a large amount of observational data available for many languages. The main scientific challenge for computational linguistics is the creation of theories and methods that fruitfully combine large-scale, corpus-based approaches with the linguistic depth of more theoretical methods. I report here on some recent and current work from our group, where large-scale, data-intensive computational modelling techniques are used to address linguistic questions about language universals, properties that apply to all existing languages, despite their well-documented diversity. On the one hand, we investigate the factors that govern one of the most apparent sources of diversity across languages: the order of words. First we report on work that investigates whether typological frequencies are systematically correlated to abstract syntactic principles at work in structure building and movement. Then, we investigate higher level structural principles of efficiency and complexity: the availability of several large-scale treebanks allows us to ask this question in a novel way. In a large-scale, computational study, we confirm a trend towards minimisation of the distance between words, in time and across languages. On the other hand, much like the comparative method in linguistics, cross-lingual corpus investigations take advantage of any corresponding annotation or linguistic knowledge across languages. The third case study shows that corpus data and typological data involving the causative alternation exhibit interesting correlations explained by the notion of spontaneity of an event. Finally, we report on work that exploits differences across languages in the surface expression of meaning, specifically lexical aspect, to show that complementary information about one language can be extracted from their translations in a second language for the NLP task of event duration prediction. This line of work leverages both similarities and differences across languages of the world to discover universal properties and develop truly cross-lingual NLP tools.

 

Bio: Paola Merlo is a professor in the Linguistics department of the University of Geneva. She is the head of the interdisciplinary research group Computational Learning and Computational Linguistics (CLCL). The group is concerned with interdisciplinary research combining linguistic modelling with machine learning techniques. The scope of her current research includes fundamental issues in the statistical nature of language, empirical evaluations of linguistic proposal about the lexical semantics of verbs and language universals of word order and statistical models of syntactic and semantic parsing. Prof. Merlo is the current editor of the journal of the Association for Computational Linguistics, Computational Linguistics, and has been member of the executive committee of the EACL and currently of the ACL. Prof. Merlo studied theoretical linguistics at the University of Venice and holds a doctorate in Computational Linguistics from the University of Maryland, USA. She has been associate research fellow at the Institute for Cognitive Science at the University of Pennsylvania, and has been visiting scholar at Rutgers, Edinburgh, and Stanford.

   
   

Enrique Alfonseca,

Google Research Zurich 

Title
Language Technologies at Google


Abstract
 
In this talk I will talk about language technologies at Google: what we are doing, for which applications, how we decide on what to work and how we proceed during development. I will go deeper into a couple of applications on which my team was involved: event understanding and sentence compression, and discuss some of the challenges ahead.

Bio: Enrique Alfonseca is a research scientist at Google Research Zurich where he currently manages a language understanding team. During the past four years he has been a member of areas of ads quality, search quality and natural language understanding, contributing in areas such as query expansion and relevance estimation for sponsored search, ranking for web search, information extraction, unsupervised semantic parsing, lexical semantics and automatic text summarization. He received a Ph.D. in Computer Science from Universidad Autonoma de Madrid in 2003, and he has over 80 research publications, mainly in the fields of computational linguistics and information retrieval.