Summary confidence: high
This dataset is a bibliographic reference list with 11,359 rows and 9 columns (key, author, citation, title, year, plus mostly-empty editor/publisher/journal/url fields). The most informative columns are author, citation, title, and year — the rest are either unique IDs or near-empty categoricals. Note that author has a 66% duplicate rate and 1,277 empty values, while citation and title both show heavy duplication (54% and 50%) driven by a handful of large source collections like Koelle's Polyglotta africana and the Africa Museum and Austronesian web archives. The year column spans 271 distinct values with reasonable spread (entropy ratio 0.75), though about 11% of rows have no year and another 574 are marked 'n.d.'. Author and title are also multilingual, with English dominant but meaningful German, French, Spanish, and Chinese subsets.
citing: row_count · column_count · columns.author.stats.duplicate_rate · columns.author.stats.n_empty · columns.author.language_counts · columns.author.top_values · columns.citation.stats.duplicate_rate · columns.citation.top_values · columns.title.stats.duplicate_rate · columns.title.top_values · columns.title.language_counts · columns.year.n_unique · columns.year.stats.entropy_ratio · columns.year.top_values · columns.editor.stats.top_rate · columns.publisher.stats.top_rate · columns.url.stats.top_rate