dataset

CoNLL-2003

CoNLL-2003 is a widely-used benchmark dataset for named entity recognition (NER) tasks in natural language processing (NLP). It consists of English and German news articles annotated with four entity types: persons, organizations, locations, and miscellaneous names. The dataset was created for the CoNLL-2003 shared task and has become a standard for evaluating NER models.

Also known as: CoNLL 2003, CoNLL2003, CoNLL-03, CoNLL03, CoNLL 03

🧊Why learn CoNLL-2003?

Developers should use CoNLL-2003 when training or benchmarking NER models, as it provides a consistent and well-annotated dataset for comparing performance across different algorithms. It is essential for research in information extraction, text mining, and applications like chatbots or search engines that require entity identification.