dataset
CoNLL-2003
CoNLL-2003 is a widely-used benchmark dataset for named entity recognition (NER) tasks in natural language processing (NLP). It consists of English and German news articles annotated with four entity types: persons, organizations, locations, and miscellaneous names. The dataset was created for the CoNLL-2003 shared task and has become a standard for evaluating NER models.
Also known as: CoNLL 2003, CoNLL2003, CoNLL-03, CoNLL03, CoNLL 03
🧊Why learn CoNLL-2003?
Developers should use CoNLL-2003 when training or benchmarking NER models, as it provides a consistent and well-annotated dataset for comparing performance across different algorithms. It is essential for research in information extraction, text mining, and applications like chatbots or search engines that require entity identification.