Memories for Life
Home About us Network Resources Events News Contact Member login
*
* * *
 

In terms of my background, I got a PhD in Natural Language Generation in 1990 (Harvard), and have been working in this area ever since; I've been at the University of Aberdeen since 1995 (currently as a Senior Lecturer in Computing Science). I've written about 50 refereed papers, mostly in NLG although I've also dabbled in other areas of AI and in medical informatics. No astounding professional activities, although I was head of ACL SIGGEN (the main NLG professional organisation) for a while. I spent a few years in industry before coming to Aberdeen, and most of my projects have been collaborations with companies or the medical community. I'm especially keen on applications which help disadvantaged people, I'd like CS research to help address social problems as well as create wealth.

WWW home page
Email this Member

 
I am a computer scientist who mostly works on Natural Language Generation, that is software systems that automatically produces texts in English (or other human languages) from non-linguistic input data. For example, my research group has produced a system which generates smoking-cessation letters (output) based on a questionnaire about smoking habits and beliefs (input); a system which generates textual weather forecasts (output) from numerical weather prediction data (input); and a system which generates a textual feedback report (output) that describes how well someone did on a literacy assessment (input data). The quality of the texts produced by our systems can be reasonable, for example weather-forecast readers in some ways prefer our computer-generated forecasts to manually-written forecasts.

##One finding that has emerged strongly from our research is that there are large differences in how different people read (interpret) and write (produce) language; in other words in people's "idiolect". I believe that a good understanding of idiolect would enable us to generate much better texts than we currently do; and more generally to develop a better understanding of how communication between humans and computers can fail.## This could have major benefits commercially, and also to society. For example if we could do a better job of explaining basic health information to people with limited literacy, for instance by explaining medical concepts in the language of the reader, this could have a major impact on health.

From the perspective of this research agenda, I'm excited about Memories for Life because it's a way of building up a large amount of data about how individuals use language. Current linguistic corpora tend to be based on things like newspaper articles; in other words, texts written by professional writers (who in many cases are not known), without much (if any) non-linguistic context. What I need for my research is sizable collections of language produced (orally or written) by individuals, preferably individuals from a variety of backgrounds (eg, single mothers from council estates as well as professional journalists), ideally accompanied by information about the context the language is being used in (eg, said to a small child at 10PM, in the child's bedroom), and also be information about the language that the subject hears or reads (because this also influences idiolect). Building such a resource on my own is a daunting task; but if other researchers would also value such a resource, then perhaps the Memories for Life community as a whole can build it. Furthermore, if my ideas work and we can generate better texts for individuals by using data about how individuals use language, then this would provide a concrete benefit to people ("put your memories into our system and you'll in return get easier-to-understand health information").

I'm also very interested in how language relates to the world, in particular in what words mean in terms of non-linguistic data (for example, what RGB colours does "pink" refer to? What clock time does "evening" refer to? What spatiotemporal trajectories can be described by "meandering"? Etc). I think a good Memories for Life corpus could again be very helpful in investigating this.
*