Summary confidence: high
This dataset contains 4,981 cognate sets sourced entirely from the 'iecor' source_dataset, each identified by a unique cognate_id. The two main numeric signals are language_count and word_count, which are nearly identical in distribution: both have a median of 2 and mean around 5.17, but stretch out to a maximum of 157 with skew above 6.8 and roughly 13% of rows flagged as outliers. That long tail is the most interesting story — most cognate sets are small, but a minority span very many languages/words and deserve a closer look. Note that concept is empty for every row, confidence is constant at 1.0, and source_dataset has only one value, so those columns carry no analytic signal.
citing: row_count · column_count · language_count · word_count · concept · confidence · source_dataset · cognate_id