emoji unicode emoji list 20260119
Reading
This dataset is a catalog of 5,225 Unicode emoji, with each row carrying the emoji glyph, its codepoint sequence, an English-leaning name, and three classification fields (group, subgroup, status). The collection is heavily skewed toward people: the 'group' field shows 'People & Body' accounts for 3,468 of 5,225 rows (about 66%), so most subsequent breakdowns will be dominated by human figures. The 'status' field is similarly lopsided, with 'fully-qualified' covering 3,944 rows versus much smaller minimally-qualified, unqualified, and component buckets. The 'subgroup' column gives a finer 100-way split worth exploring, led by person-activity (697) and person-role (635). Name-level duplication (1,272 duplicate names, ~24%) reflects skin-tone and gender variants of the same base concept, which is the other thing to keep in mind when counting.
citing: row_count · column_count · columns.group.top_values · columns.group.stats · columns.status.top_values · columns.status.stats · columns.subgroup.top_values · columns.subgroup.stats · columns.name.stats · columns.name.top_words · columns.codepoints.stats
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| People & Body | 3468 | 66.4% |
| Objects | 316 | 6.0% |
| Symbols | 305 | 5.8% |
| Flags | 276 | 5.3% |
| Travel & Places | 268 | 5.1% |
| Smileys & Emotion | 187 | 3.6% |
| Animals & Nature | 167 | 3.2% |
| Food & Drink | 133 | 2.5% |
| Activities | 96 | 1.8% |
| Component | 9 | 0.2% |
Show data table
| value | count | share |
|---|---|---|
| fully-qualified | 3944 | 75.5% |
| minimally-qualified | 1029 | 19.7% |
| unqualified | 243 | 4.7% |
| component | 9 | 0.2% |
Show data table
| value | count | share |
|---|---|---|
| person-activity | 697 | 13.3% |
| person-role | 635 | 12.2% |
| family | 533 | 10.2% |
| person-sport | 480 | 9.2% |
| person-gesture | 300 | 5.7% |
| country-flag | 259 | 5.0% |
| person-fantasy | 246 | 4.7% |
| person | 192 | 3.7% |
| animal-mammal | 68 | 1.3% |
| hand-fingers-open | 67 | 1.3% |
| sky & weather | 65 | 1.2% |
| hands | 62 | 1.2% |
| hand-fingers-partial | 55 | 1.1% |
| transport-ground | 55 | 1.1% |
| clothing | 50 | 1.0% |
| body-parts | 49 | 0.9% |
| alphanum | 49 | 0.9% |
| hand-single-finger | 43 | 0.8% |
| person-resting | 42 | 0.8% |
| tool | 38 | 0.7% |
Show data table
| chars | count |
|---|---|
| 7 – 9 | 36 |
| 9 – 11 | 154 |
| 11 – 13 | 216 |
| 13 – 15 | 252 |
| 15 – 17 | 319 |
| 17 – 19 | 374 |
| 19 – 21 | 249 |
| 21 – 23 | 189 |
| 23 – 25 | 150 |
| 25 – 27 | 135 |
| 27 – 29 | 125 |
| 29 – 31 | 122 |
| 31 – 33 | 162 |
| 33 – 35 | 252 |
| 35 – 37 | 256 |
| 37 – 39 | 245 |
| 39 – 41 | 275 |
| 41 – 43 | 273 |
| 43 – 45 | 195 |
| 45 – 46 | 162 |
| 46 – 48 | 153 |
| 48 – 50 | 120 |
| 50 – 52 | 63 |
| 52 – 54 | 64 |
| 54 – 56 | 76 |
| 56 – 58 | 60 |
| 58 – 60 | 61 |
| 60 – 62 | 75 |
| 62 – 64 | 66 |
| 64 – 66 | 70 |
| 66 – 68 | 64 |
| 68 – 70 | 40 |
| 70 – 72 | 32 |
| 72 – 74 | 46 |
| 74 – 76 | 28 |
| 76 – 78 | 24 |
| 78 – 80 | 26 |
| 80 – 82 | 8 |
| 82 – 84 | 4 |
| 84 – 86 | 4 |
Show data table
| chars | count |
|---|---|
| 4 – 5 | 1400 |
| 5 – 6 | 0 |
| 6 – 8 | 0 |
| 8 – 9 | 0 |
| 9 – 10 | 249 |
| 10 – 12 | 894 |
| 12 – 13 | 0 |
| 13 – 14 | 0 |
| 14 – 15 | 136 |
| 15 – 16 | 79 |
| 16 – 18 | 0 |
| 18 – 19 | 0 |
| 19 – 20 | 141 |
| 20 – 22 | 544 |
| 22 – 23 | 325 |
| 23 – 24 | 0 |
| 24 – 25 | 25 |
| 25 – 26 | 553 |
| 26 – 28 | 15 |
| 28 – 29 | 20 |
| 29 – 30 | 12 |
| 30 – 32 | 42 |
| 32 – 33 | 45 |
| 33 – 34 | 0 |
| 34 – 35 | 6 |
| 35 – 36 | 60 |
| 36 – 38 | 48 |
| 38 – 39 | 105 |
| 39 – 40 | 205 |
| 40 – 42 | 33 |
| 42 – 43 | 3 |
| 43 – 44 | 95 |
| 44 – 45 | 0 |
| 45 – 46 | 0 |
| 46 – 48 | 0 |
| 48 – 49 | 0 |
| 49 – 50 | 95 |
| 50 – 52 | 0 |
| 52 – 53 | 0 |
| 53 – 54 | 95 |
Schema
6 columns| Alerts | ||||
|---|---|---|---|---|
| emoji | text | 0.0% | 5,225 |
near_unique
one_word
allcaps
short_text
|
| codepoints | text | 0.0% | 5,225 |
near_unique
one_word
allcaps
|
| status | categorical | 0.0% | 4 |
|
| name | text | 0.0% | 3,953 |
multilingual
duplicates
|
| group | categorical | 0.0% | 10 |
|
| subgroup | categorical | 0.0% | 100 |
|
emoji
text identifier near_unique one_word allcaps short_textThis column appears to be a unique emoji identifier or catalog entry, with all 5225 values distinct (n_unique equals n) and a 97.7% emoji_rate. Each entry is a single token (one_word_rate 1.0) ranging from 1 to 10 characters with a median length of 3. The 51.3% allcaps_rate is unusual for an emoji column and suggests some entries contain ASCII letter components (e.g., regional indicators or text-based glyphs) rather than pure pictographs. Treatment: Treat as a unique key; drop from modelling or use only as a join/lookup field.
- n
- 5,225
- nulls
- 0 (0.0%)
- unique
- 5,225
- len_min
- 1
- len_max
- 10
- len_mean
- 3.416
- len_median
- 3
- len_p95
- 8
- word_mean
- 1
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 5,225
- readability_flesch_mean
- 1.818
- emoji_rate
- 0.9774
- url_rate
- 0
- one_word_rate
- 1
- allcaps_rate
- 0.5133
- boilerplate_rate
- 0
codepoints
text identifier near_unique one_word allcapsThis column holds Unicode codepoint sequences (likely emoji definitions), with every one of the 5225 rows unique and fully uppercase-hex. Tokens like '200d' (zero-width joiner, 3747 occurrences), 'fe0f' (variation selector, 1318), and skin-tone modifiers '1f3fb'-'1f3ff' (703 each) dominate, alongside the man/woman bases '1f468'/'1f469' (676 each). String length averages 18 characters (max 54) with a median of 3 tokens, consistent with multi-codepoint emoji ZWJ sequences. Treatment: Treat as a unique key per emoji; split on whitespace into codepoint tokens if structural features are needed.
- n
- 5,225
- nulls
- 0 (0.0%)
- unique
- 5,225
- len_min
- 4
- len_max
- 54
- len_mean
- 18.04
- len_median
- 15
- len_p95
- 43
- word_mean
- 3.416
- word_median
- 3
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 1,451
- readability_flesch_mean
- 118.8
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0.2679
- allcaps_rate
- 1
- boilerplate_rate
- 0
status
categorical labelCategorical qualification status with 4 levels and no nulls across 5225 rows. Heavily dominated by 'fully-qualified' at 75.5%, with 'minimally-qualified' (1029) and 'unqualified' (243) trailing, and 'component' a rare tail at just 9 occurrences. Entropy ratio of 0.495 confirms the imbalance. Treatment: One-hot encode; consider collapsing the rare 'component' class or stratifying splits to preserve it.
- n
- 5,225
- nulls
- 0 (0.0%)
- unique
- 4
- top_value
- fully-qualified
- top_rate
- 0.7548
- cardinality
- 4
- entropy
- 0.9896
- entropy_ratio
- 0.4948
name
text label multilingual duplicatesThis column holds short descriptive labels for emoji (e.g. 'E4.0 man detective', 'E15.1 woman walking facing right: dark skin tone'), averaging 5.5 words and 33.8 characters with a versioned 'E#.#' prefix. Duplicates are heavy at 24.3% (1272 rows) because skin-tone variants share base names — 'skin' (3450) and 'tone' (2800) dominate the vocabulary of 1912 tokens. Although 4680 rows are tagged English, the language detector also flags 29 other languages including German (36), Persian (33), and Polish (27), likely false positives on the short codepoint-style tokens rather than true multilingual content. Treatment: Treat as the canonical emoji label; strip the 'E#.#' version prefix and skin-tone suffix if you need a deduplicated key.
- n
- 5,225
- nulls
- 0 (0.0%)
- unique
- 3,953
- len_min
- 7
- len_max
- 86
- len_mean
- 33.81
- len_median
- 34
- len_p95
- 67
- word_mean
- 5.521
- word_median
- 6
- n_empty
- 0
- n_duplicates
- 1,272
- duplicate_rate
- 0.2434
- vocab_size
- 1,912
- readability_flesch_mean
- 78.3
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0.0001914
- boilerplate_rate
- 0
group
categorical featureThis is a categorical grouping column with 10 distinct values matching the standard Unicode emoji category taxonomy (e.g., "People & Body", "Smileys & Emotion", "Flags"). The distribution is heavily imbalanced: "People & Body" alone covers 66.4% of the 5,225 rows, while "Component" appears just 9 times. No nulls, and entropy ratio of 0.57 confirms the skew toward one dominant class. Treatment: One-hot or target-encode; consider grouping the rare "Component" class given its tiny support.
- n
- 5,225
- nulls
- 0 (0.0%)
- unique
- 10
- top_value
- People & Body
- top_rate
- 0.6637
- cardinality
- 10
- entropy
- 1.908
- entropy_ratio
- 0.5743
subgroup
categorical featureCategorical taxonomy label with 100 distinct subgroups across 5225 rows and no nulls, suggesting an emoji or icon classification scheme dominated by people-related categories. The top value 'person-activity' covers 13.3% of rows, and the top eight values are all person/family/flag related, indicating a long tail where 92 remaining subgroups together account for most of the diversity (entropy ratio 0.75). No single category dominates overwhelmingly, but the person-centric concentration is notable. Treatment: Group-encode or target-encode given the 100 levels, or collapse rare subgroups into an 'other' bucket before modelling.
- n
- 5,225
- nulls
- 0 (0.0%)
- unique
- 100
- top_value
- person-activity
- top_rate
- 0.1334
- cardinality
- 100
- entropy
- 4.982
- entropy_ratio
- 0.7498