Natural Language Engineering

Question	Answer
Tokenisation	converting strings to tokens
Segmentation	splitting into sentences
Stemming	Image: 2f861697-c049-4375-ab01-5ae661886f55 (image/jpg)
Lemmatisation	Image: fa27611a-fb61-4b73-8422-cead9a22b26b (image/jpg)
Part-of-speech tagging	label tokens nouns/verbs/adjectives etc
phrasal chunking	form phrasal units from tokens, typically refer to enitities and actions
Syntactic analysis	hierarchical analyse sentences into components (subject/object/main verb)
Semantic analysis	express literal meaning
Named entity recognistion	identify type of entity (person/institution/place/time)
Reference resolution	link difference references to the same entity
Word sense disambiguation	using context to disambiguate use of a word
Relation detection	identify and classify relationships between entiities
Event detection	identify, classify and temporarily order events
Topic identification	identify words/phrases that relate to a topic
Text similarity	measure relevance of a document to a query, or of one document to another
NLP pipeline	Image: 16b6aff8-11ed-4d7e-a9dc-87ff8f42eabe (image/jpg)
Why is NLP hard?	Many different ways of saying things Hundreds of different dialects Language is used in highly creative ways

Resource summary

	Created by Luke Vincent almost 11 years ago