{"id":2035,"date":"2025-09-08T09:50:25","date_gmt":"2025-09-08T07:50:25","guid":{"rendered":"https:\/\/centerforthehumanpast.se\/?p=2035"},"modified":"2025-10-01T09:53:44","modified_gmt":"2025-10-01T07:53:44","slug":"the-indo-european-cognate-relationships-dataset-is-now-published-in-nature","status":"publish","type":"post","link":"https:\/\/centerforthehumanpast.se\/index.php\/2025\/09\/08\/the-indo-european-cognate-relationships-dataset-is-now-published-in-nature\/","title":{"rendered":"The Indo-European Cognate Relationships dataset is now published"},"content":{"rendered":"\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-0cd477d5 wp-block-columns-is-layout-flex\" style=\"padding-top:var(--wp--preset--spacing--small);padding-bottom:var(--wp--preset--spacing--small)\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:25%\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"480\" height=\"600\" src=\"https:\/\/centerforthehumanpast.se\/wp-content\/uploads\/2024\/04\/Harald-3.jpg\" alt=\"Harald Hammarstr\u00f6m\" class=\"wp-image-332\" srcset=\"https:\/\/centerforthehumanpast.se\/wp-content\/uploads\/2024\/04\/Harald-3.jpg 480w, https:\/\/centerforthehumanpast.se\/wp-content\/uploads\/2024\/04\/Harald-3-240x300.jpg 240w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/><figcaption class=\"wp-element-caption\"><sup>Harald Hammarstr\u00f6m is one of the co-authors.<\/sup><\/figcaption><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:75%\">\n<h4 class=\"wp-block-heading\">Abstract<\/h4>\n\n\n\n<p>The Indo-European Cognate Relationships (IE-CoR) dataset is an open-access relational dataset showing how related, inherited words (\u2018cognates\u2019) pattern across 160 languages of the Indo-European family. IE-CoR is intended as a benchmark dataset for computational research into the evolution of the Indo-European languages. It is structured around 170 reference meanings in core lexicon, and contains 25731 lexeme entries, analysed into 4981 cognate sets. Novel, dedicated structures are used to code all known cases of horizontal transfer. All 13 main documented clades of Indo-European, and their main subclades, are well represented. Time calibration data for each language are also included, as are relevant geographical and social metadata. Data collection was performed by an expert consortium of 89 linguists drawing on 355 cited sources. The dataset is extendable to further languages and meanings and follows the Cross-Linguistic Data Format (CLDF) protocols for linguistic data. It is designed to be interoperable with other cross-linguistic datasets and catalogues, and provides a reference framework for similar initiatives for other language families.<\/p>\n<\/div>\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/centerforthehumanpast.se\/wp-content\/uploads\/2025\/09\/image.png\" alt=\"\" class=\"wp-image-2036\"\/><figcaption class=\"wp-element-caption\"><sup>Fig. 1 Language sample in IE-CoR 1.2. Colours represent main clades.<\/sup><\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"Sec2\">Background: the Indo-European languages and phylogenetic research<\/h4>\n\n\n\n<p>Almost half of the world\u2019s population speaks a language of the Indo-European lineage. This huge family of over 400 languages has a long research tradition stretching back well over two hundred years, but much remains to be understood about its origins, dispersal, and internal structure. In particular, major phylogenetic analyses in recent years, as surveyed in, have supported conflicting hypotheses for the time depth and geographical origin of Indo-European. Recent analyses have mostly used state-of-the-art Bayesian phylogenetic analysis tools, applied to datasets of cognates (related words) across the Indo-European languages, i.e. forerunners of the new IE-CoR dataset presented here. Those past datasets have been criticised, however, for their limited and uneven coverage of the Indo-European family through time and space, and across its internal diversity, as well as for poor data coding \u2014 data problems directly implicated in the inconsistent phylogenetic results obtained.<\/p>\n\n\n\n<p>The new&nbsp;Indo-European&nbsp;Cognate&nbsp;Relationships (IE-CoR) dataset is designed to overcome the limitations of past datasets. It encodes cognate relationships in 170 meanings of core vocabulary (i.e. basic terms like&nbsp;hand,&nbsp;drink,&nbsp;black,&nbsp;three) across 160 Indo-European languages. (For explanations of linguistic terminology used in this text, such as \u2018cognate\u2019, see the Definitions box.) IE-CoR aims to provide a benchmark dataset for quantitative and phylogenetic research on the Indo-European (IE) language family.<\/p>\n\n\n\n<p>Anderson, C., Scarborough, M., Jocz, L.\u00a0<em>et al.<\/em>\u00a0The Indo-European Cognate Relationships dataset.\u00a0<em>Sci Data<\/em>\u00a0<strong>12<\/strong>, 1541 (2025). <a href=\"https:\/\/doi.org\/10.1038\/s41597-025-05445-3\">https:\/\/doi.org\/10.1038\/s41597-025-05445-3<\/a><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Abstract The Indo-European Cognate Relationships (IE-CoR) dataset is an open-access relational dataset showing how related, inherited words (\u2018cognates\u2019) pattern across 160 languages of the Indo-European family. IE-CoR is intended as [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2036,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_EventAllDay":false,"_EventTimezone":"","_EventStartDate":"","_EventEndDate":"","_EventStartDateUTC":"","_EventEndDateUTC":"","_EventShowMap":false,"_EventShowMapLink":false,"_EventURL":"","_EventCost":"","_EventCostDescription":"","_EventCurrencySymbol":"","_EventCurrencyCode":"","_EventCurrencyPosition":"","_EventDateTimeSeparator":"","_EventTimeRangeSeparator":"","_EventOrganizerID":[],"_EventVenueID":[],"_OrganizerEmail":"","_OrganizerPhone":"","_OrganizerWebsite":"","_VenueAddress":"","_VenueCity":"","_VenueCountry":"","_VenueProvince":"","_VenueState":"","_VenueZip":"","_VenuePhone":"","_VenueURL":"","_VenueStateProvince":"","_VenueLat":"","_VenueLng":"","_VenueShowMap":false,"_VenueShowMapLink":false,"footnotes":""},"categories":[139,60,61],"tags":[140,142,67,121,66,17,143,141],"class_list":["post-2035","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data","category-publications","category-research","tag-cognate","tag-dataset","tag-human-past","tag-indo-european-languages","tag-interdisciplinary-research","tag-linguistics","tag-nature","tag-phylogenetic"],"_links":{"self":[{"href":"https:\/\/centerforthehumanpast.se\/index.php\/wp-json\/wp\/v2\/posts\/2035","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/centerforthehumanpast.se\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/centerforthehumanpast.se\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/centerforthehumanpast.se\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/centerforthehumanpast.se\/index.php\/wp-json\/wp\/v2\/comments?post=2035"}],"version-history":[{"count":2,"href":"https:\/\/centerforthehumanpast.se\/index.php\/wp-json\/wp\/v2\/posts\/2035\/revisions"}],"predecessor-version":[{"id":2099,"href":"https:\/\/centerforthehumanpast.se\/index.php\/wp-json\/wp\/v2\/posts\/2035\/revisions\/2099"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/centerforthehumanpast.se\/index.php\/wp-json\/wp\/v2\/media\/2036"}],"wp:attachment":[{"href":"https:\/\/centerforthehumanpast.se\/index.php\/wp-json\/wp\/v2\/media?parent=2035"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/centerforthehumanpast.se\/index.php\/wp-json\/wp\/v2\/categories?post=2035"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/centerforthehumanpast.se\/index.php\/wp-json\/wp\/v2\/tags?post=2035"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}