Posted in:

What is Coreference Resolution?

Coreference vs. anaphora resolution

Coreference resolution in NLP is an umbrella term covering several kinds of references. One of them is anaphora resolution. Although we distinguish between them, in many cases one equals another. Coreference has a broader scope though and we can assume that the vast majority of cases are coreferential. In a sentence, anaphora appears after the word it refers to.

An example of coreference resolution would be:

“John was born in June. He is the same sign as me: Gemini.”

To find all mentions of “John” and group them into chains, the system has to run through all the mentions and replace them with “John”. In this case, you would get something like:

“John [antecedent] was born in June. John [anaphora] is the same sign as me: Gemini.”

This should give you an idea of how we can use coreference resolution in our text analysis tool. A good way to do it would be by first analyzing all mentions of an entity in your document and then grouping them into chains. The next step would be to see if you can use a similar approach to find all mentions of the same entity in other documents.

Different types of references

There are different types of references that you can use in your text analysis tool. These include:

Anaphora

Anaphora is found in a sentence after the word it refers to. The word it is referring to is called an antecedent.

Cataphora

Cataphora is found in a sentence before the word it refers to. The word it is referring to is called a postcedent.

Coreferring noun phrases

Corefering noun phrases take place when the second noun phrase in a sentence refers to an earlier, descriptive form of it.

Presuppositions

Presuppositions are classified as coreference resolutions. In sentences of this type, pronouns aren’t exactly referential – we can’t replace them with quantified expressions.

There are also certain situations that can be misleading, such as cleft sentences or pleonastic “it”. That is why NLP is one of the most challenging branches of Artificial Intelligence.