saturn

/home/coolhand/html/datavis/data_trove/data/quirky/fossils.json 22,043 rows sample n=22,043 seed 42 2026-06-22T00:37:57+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/quirky/fossils.json
Total rows22,043
Profiled sample22,043
Columns21
Generated2026-06-22T00:37:57+00:00
Show data table
Per-column null rate across the corpus.
columnkindnull %
nametext0.0%
rankcategorical0.0%
latnumeric0.0%
lonnumeric0.0%
early_age_myanumeric0.0%
late_age_myanumeric0.0%
periodcategorical0.0%
late_intervalcategorical0.0%
phylumcategorical0.0%
classcategorical0.0%
ordercategorical0.0%
familycategorical0.0%
genustext0.0%
countrycategorical0.0%
statecategorical0.0%
formationcategorical0.0%
collectioncategorical0.0%
paleolatnumeric2.2%
paleolngnumeric2.2%
reference_notext0.0%
occurrence_notext0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This is a fossil occurrence dataset containing 22,043 records spanning taxonomic classifications, geographic coordinates, and geological time ranges for paleontological finds. The taxonomic breakdown is dominated by Chordata (81.6%) with Mammalia, Saurischia, and Ornithischia as the leading classes, while over half of all occurrences (50.9%) come from the United States — worth examining for geographic bias. The geological age columns (early_age_mya and late_age_mya) span from near-present to over 500 million years ago with high spread and outliers, suggesting the dataset mixes very different eras of life. Taxonomic rank is split between species (41%) and genus (33%), meaning precision of identification varies considerably across records and may affect comparative analyses.

reference_no high anthropic:default

This column is a reference number field — short numeric-looking codes (mean length 4.17 characters, max 5) that identify some entity like a case, order, or account. Despite the name implying uniqueness, the duplicate rate is extremely high at 83.1%, with only 3,725 unique values across 22,043 rows, meaning the same reference numbers recur many times (e.g., '4245' appears 794 times, '6294' 743 times). The allcaps alert (89.8%) is likely a false positive triggered by numeric-only strings. This column appears to be a grouping or foreign-key identifier rather than a row-level unique reference.

genus high anthropic:default

This column contains biological genus names from a paleontological or natural history dataset, as evidenced by taxa such as Palmatolepis (conodonts), Grallator/Eubrontes (dinosaur tracks), Baculites (ammonites), and Equus (horses). The duplicate rate is extremely high at 88.2% (19,435 of 22,043 rows share a value), which is expected for a categorical taxonomic label with only 2,608 unique genera. Notably, 5,545 rows (25.2% of the dataset) have an empty string rather than a null — a data quality issue that should be treated as missing. The vocabulary of 2,525 single-word tokens aligns tightly with the 2,608 unique values, confirming these are clean, single-word Latin genus names.

occurrence_no high anthropic:default

This column is a unique occurrence identifier — likely a numeric reference code stored as text, given the all-caps alert (which reflects purely digit characters) and mean length of ~5.76 characters. All 22,043 rows are non-null, non-duplicate, and every value is a single token, confirming it as a primary key-style field. The values range from short (min length 1) to 7 characters and appear to be plain integers (e.g., '164260', '1439335'), with no patterns suggesting a structured prefix scheme.

name high anthropic:default

This column contains taxonomic names of fossil organisms — dinosaurs (Theropoda, Sauropoda, Hadrosauridae), conodonts (Palmatolepis, Polygnathus, Icriodus), and other paleontological taxa — making it a biological classification label rather than a unique specimen identifier. The duplicate rate is extremely high at 78.9%, with only 4,660 unique values across 22,043 rows, reflecting that many specimens share the same taxon name. Over half of values (58.5%) are single words (genus or clade names), with a mean word count of 1.42, consistent with Linnaean binomial or higher-rank nomenclature. The top value 'Theropoda' alone appears 768 times, confirming this is a categorical grouping field, not a unique label.

lat high anthropic:default

This column contains geographic latitude values, with a valid range from -84.33° to 79.75° and a median of 41.70°, consistent with mid-northern hemisphere locations (e.g., Europe or northern US). The distribution is strongly left-skewed (skew = -2.44) with high kurtosis (7.05), indicating a heavy tail of unusually southern or southern-hemisphere values — 9.2% of rows (2,019 records) are flagged as outliers. The -84.33° minimum is suspicious as it approaches the Antarctic Circle and may represent data quality issues or encoding errors.

early_age_mya high anthropic:default

This column represents the early (older) age boundary of a geological time range in millions of years ago (Mya), likely for fossil taxa or stratigraphic intervals. With only 164 unique values across 22,043 rows, most records share standardized geological stage boundaries rather than continuous age estimates. The distribution is right-skewed (skew 1.13) with a mean of ~154.7 Mya but a median of only 110.1 Mya, and 11.6% of values (2,549 rows) are flagged as outliers—driven by a long tail extending to 538.8 Mya representing Cambrian or older occurrences against a bulk of Mesozoic/Cenozoic records.

late_age_mya high anthropic:default

This column records the late (younger) age boundary of a fossil taxon's stratigraphic range in millions of years ago (Mya), a standard field in paleontological occurrence databases. With only 156 unique values across 22,043 rows, ages are drawn from a discrete set of stage/interval boundaries rather than continuous measurements. The distribution is right-skewed (skew 1.17, mean 147.5 vs. median 93.9) and 11.5% of rows are flagged as outliers, driven by a long tail extending to 521 Mya — likely Cambrian/Ordovician taxa pulling the upper end — while the near-zero minimum (0.0, ~0.1% of rows) represents taxa surviving to the Recent.

paleolat high anthropic:default

This column represents paleolatitude — the reconstructed geographic latitude of a fossil or geological sample at the time of its deposition, ranging from -86.16° (near the South Pole) to 89.2° (near the North Pole). The distribution is moderately left-skewed (skew = -1.08) with a mean of 26.46° but a median of 34.89°, indicating a pull toward southern or equatorial values in the tail. Notably, 6.97% of records (1,503) are flagged as outliers, likely representing polar or high-latitude specimens that are genuinely rare in the fossil record rather than data errors. The 3,214 unique values across 22,043 records suggests coordinate rounding or discrete sampling grids.

collection high anthropic:default

This column is a categorical field named 'collection' that is entirely empty strings across all 22,043 rows — it has cardinality of 1 and a top_value of '' with top_rate of 1.0. There are no nulls, meaning the field was populated with blank strings rather than left absent. The column carries zero information (entropy = 0.0) and is completely useless for analysis in its current state.

formation high anthropic:default

This column, 'formation', is a categorical field that is entirely empty: all 22,043 rows contain a blank string, giving it a cardinality of 1 and an entropy of 0. There is no null rate, meaning the field was populated with empty strings rather than true nulls. It carries zero information and should be dropped.

lon high anthropic:default

This column contains geographic longitude values, spanning from -176.7° to 177.1°, consistent with near-global coverage. The median of -98.25° and mean of -47.2° indicate a strong concentration of records in the Americas (particularly North/Central America), which explains the positive skew of 0.93 — the distribution is pulled rightward toward European/African longitudes. The wide IQR of 114° and only 4,259 unique values across 22,043 rows suggests coordinates are rounded or snapped to coarse grid points rather than being fully precise.

paleolng high anthropic:default

This column represents paleogeographic longitude — the reconstructed east-west position of a sample location at the time of fossil deposition, typically ranging from −180° to +180°. The values span −177.6 to 168.7, consistent with valid global longitude, and the 3,715 unique values across 22,043 records suggest discretized but reasonably fine-grained spatial resolution. The distribution is moderately right-skewed (skew 0.737) with a wide IQR of 97.35°, indicating samples are spread broadly but with more density in the western/negative hemisphere — the median of −62.15 is well below the mean of −28.58. Null rate of 2.23% is minor but worth flagging for paleogeographic reconstructions where completeness matters.

class high anthropic:default

This column contains the biological/taxonomic class of fossil or paleontological specimens, with 19 distinct values across 22,043 records and no nulls. The top value 'Mammalia' accounts for 31.8% of records, followed by dinosaur orders 'Saurischia' and 'Ornithischia' — notably these are clades, not true classes, mixed in with proper classes like Mammalia and Reptilia, suggesting taxonomic rank inconsistency. Two sentinel-style values ('NO_CLASS_SPECIFIED' with 26 occurrences and 60 empty-string entries) represent ~0.4% of records and should be treated as missing. Entropy ratio of 0.61 indicates moderate concentration rather than a uniform spread.

country high anthropic:default

This column contains ISO-style two-letter country codes across 93 distinct countries, with zero nulls across 22,043 rows. The distribution is heavily US-dominated: 'US' alone accounts for 50.9% of all records (11,218 rows), while the next largest country 'CA' has only 1,830 — a roughly 6× drop-off. The entropy ratio of 0.50 confirms the pronounced imbalance, meaning models treating this as a uniform categorical feature will be misled by the long tail of 83 countries each with very small counts.

family high anthropic:default

This column contains biological family-level taxonomic classifications for fossil or specimen records, with 528 distinct family names across 22,043 rows. The most surprising signal is that the top value is an empty string (3,418 occurrences, 15.5% of rows), and the second most frequent value is the sentinel 'NO_FAMILY_SPECIFIED' (1,996 occurrences) — together these two non-informative values account for roughly 24.6% of the dataset, indicating pervasive missing family assignments. Substantive values include well-known paleontological families such as Hadrosauridae, Grallatoridae, and Palmatolepidae, confirming a paleobiology or fossil-occurrence context.

late_interval high anthropic:default

This column encodes the geological time interval representing the late (upper) bound of a fossil or stratigraphic occurrence, drawn from standard chronostratigraphic stage names (e.g., 'Tithonian', 'Sinemurian', 'Albian'). The most striking signal is that 83.1% of the 22,043 rows carry an empty string as the top value, meaning the late interval is unspecified for the vast majority of records — this is a heavily sparse categorical field despite a zero null rate. The remaining 138 distinct values span Mesozoic and Cenozoic stages with modest frequency, suggesting the dataset skews toward certain time periods (Tithonian and Sinemurian together account for ~4.4% of all rows).

order high anthropic:default

This column contains biological taxonomic order classifications (e.g., Rodentia, Carnivora, Artiodactyla), likely from a paleontological or natural history specimen dataset. Two sentinel/missing-value patterns dominate: 'NO_ORDER_SPECIFIED' accounts for 32.3% of rows (7,117) and an empty string accounts for a further 3,019 rows (~13.7%), meaning roughly 46% of records lack a meaningful order assignment. Despite 99 unique values and moderate entropy (3.89), the effective signal is skewed toward these two non-informative categories.

period high anthropic:default

This column represents geological time periods or North American Land Mammal Ages (NALMAs), with values like 'Irvingtonian', 'Torrejonian', 'Tiffanian', and 'Puercan' — terminology specific to paleontology and fossil occurrence datasets. With 298 unique values across 22,043 rows and zero nulls, the distribution is moderately concentrated: the top value 'Irvingtonian' accounts for only ~7.8% of rows, while entropy ratio of 0.78 suggests meaningful but not extreme skew. The mix of formal geological stages (e.g., 'Kimmeridgian', 'Aptian', 'Hettangian') alongside NALMA names signals that multiple classification schemes coexist in this column, which could complicate grouping or ordering without a lookup table.

phylum high anthropic:default

This column contains biological phylum classifications drawn from exactly 4 distinct values across 22,043 records, with no nulls. It is heavily dominated by 'Chordata' at 81.6% (17,993 rows), while 'Mollusca' and 'Arthropoda' each account for exactly 2,000 records — a suspiciously round number suggesting deliberate sampling or stratification. Notably, 50 records carry an empty string value, which acts as a de-facto null and should be treated as missing rather than a valid category.

rank high anthropic:default

This column encodes taxonomic rank in a biological classification dataset, with 18 distinct levels spanning the Linnaean hierarchy from species up through class and beyond. 'Species' dominates at 41.2% (9,082 rows) and 'genus' follows at 7,342 rows, which is the expected shape for a taxonomy tree where leaf nodes vastly outnumber higher groupings. Notably, 'unranked clade' appears 2,828 times—making it the third-largest category—indicating a substantial portion of entries reflect modern phylogenetic classifications that don't fit traditional Linnaean ranks. Entropy ratio of 0.50 signals moderate concentration, not uniform distribution.

state high anthropic:default

This column represents a geographic state or province field, but its 519 unique values far exceed the 50 US states, revealing a mix of US states, Canadian provinces (Alberta), English regions (England), and Chinese provinces (Guangxi) — indicating international scope. The top value is 'Wyoming' at 8.6% of rows, which is disproportionately high for a state with a small population, suggesting dataset bias or a specific collection source. Notably, 1,082 rows (roughly 4.9%) contain an empty string rather than a null, meaning the null_rate of 0.0 understates true missingness. The entropy ratio of 0.70 confirms meaningful but imperfect spread across categories.

Numeric correlation

Show data table
Pearson correlation across 6 numeric columns (values clipped to 2 decimals).
latlonearly_age_myalate_age_myapaleolatpaleolng
lat+1.00-0.27+0.04+0.05+0.12-0.17
lon-0.27+1.00+0.17+0.16-0.22+0.45
early_age_mya+0.04+0.17+1.00+1.00-0.61+0.03
late_age_mya+0.05+0.16+1.00+1.00-0.61+0.03
paleolat+0.12-0.22-0.61-0.61+1.00-0.21
paleolng-0.17+0.45+0.03+0.03-0.21+1.00

name text

58.5% rows are a single word 78.9% duplicate strings
rows22,043
null0 (0.0%)
unique4,660
len_min3
len_max47
len_mean15.095
len_median14.000
len_p9526.000
word_mean1.425
word_median1.000
n_empty0
n_duplicates17,383
duplicate_rate0.789
vocab_size5,140
readability_flesch_mean-4.127
emoji_rate0.000
url_rate0.000
one_word_rate0.585
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for name (mean: 15.094769314521617).
charscount
3 – 472
4 – 5190
5 – 6110
6 – 7466
7 – 8942
8 – 102265
10 – 112027
11 – 121560
12 – 131821
13 – 141566
14 – 152019
15 – 16626
16 – 17923
17 – 18808
18 – 201219
20 – 211117
21 – 22747
22 – 23889
23 – 24549
24 – 25531
25 – 26719
26 – 27307
27 – 28119
28 – 29178
29 – 3163
31 – 3252
32 – 3322
33 – 348
34 – 352
35 – 3617
36 – 3719
37 – 3811
38 – 3950
39 – 4013
40 – 429
42 – 432
43 – 441
44 – 453
45 – 460
46 – 471
Sample values (first 10)
  1. Animalia
  2. Microtus ochrogaster
  3. Acanthohoplites
  4. Ammonoidea
  5. Theropoda
  6. Synphoroides
  7. Hadrosauropodus leonardii
  8. Moutoniceras moutonianum
  9. Hoploscaphites
  10. Palmatolepis

rank categorical

rows22,043
null0 (0.0%)
unique18
top_valuespecies
top_rate0.412
cardinality18
entropy2.085
entropy_ratio0.500
Show data table
Top values for rank (18 unique shown, of 18 total).
valuecountshare
species908241.2%
genus734233.3%
unranked clade282812.8%
family17167.8%
subfamily2721.2%
subclass2050.9%
class1340.6%
order1150.5%
infraorder970.4%
superfamily750.3%
subgenus510.2%
kingdom500.2%
suborder290.1%
subspecies230.1%
tribe120.1%
subphylum90.0%
superorder20.0%
superclass10.0%
Top values (rank 1–20)
  1. species — 9,082
  2. genus — 7,342
  3. unranked clade — 2,828
  4. family — 1,716
  5. subfamily — 272
  6. subclass — 205
  7. class — 134
  8. order — 115
  9. infraorder — 97
  10. superfamily — 75
  11. subgenus — 51
  12. kingdom — 50
  13. suborder — 29
  14. subspecies — 23
  15. tribe — 12
  16. subphylum — 9
  17. superorder — 2
  18. superclass — 1

lat numeric

skew=-2.44 9.2% rows beyond 1.5 IQR
rows22,043
null0 (0.0%)
unique4,095
min-84.333
max79.750
mean37.120
median41.700
std19.372
q135.000
q346.609
iqr11.609
skew-2.442
kurtosis7.054
n_outliers2,019
outlier_rate0.092
zero_rate0.000
Show data table
Histogram bins for lat (median: 41.700001).
bincount
-84.33 – -80.234
-80.23 – -76.130
-76.13 – -72.030
-72.03 – -67.9312
-67.93 – -63.827
-63.82 – -59.720
-59.72 – -55.620
-55.62 – -51.520
-51.52 – -47.4116
-47.41 – -43.3160
-43.31 – -39.21111
-39.21 – -35.11127
-35.11 – -31.01166
-31.01 – -26.9347
-26.9 – -22.840
-22.8 – -18.799
-18.7 – -14.6138
-14.6 – -10.56
-10.5 – -6.394255
-6.394 – -2.29221
-2.292 – 1.810
1.81 – 5.91232
5.912 – 10.0122
10.01 – 14.1212
14.12 – 18.22168
18.22 – 22.32137
22.32 – 26.421224
26.42 – 30.52669
30.52 – 34.631615
34.63 – 38.733128
38.73 – 42.834830
42.83 – 46.933520
46.93 – 51.042987
51.04 – 55.141322
55.14 – 59.24253
59.24 – 63.34303
63.34 – 67.4499
67.44 – 71.55118
71.55 – 75.65180
75.65 – 79.7515

lon numeric

rows22,043
null0 (0.0%)
unique4,259
min-176.667
max177.071
mean-47.212
median-98.250
std79.135
q1-108.167
q35.873
iqr114.040
skew0.928
kurtosis-0.493
n_outliers3
outlier_rate1.36e-04
zero_rate2.27e-04
Show data table
Histogram bins for lon (median: -98.25).
bincount
-176.7 – -167.84
-167.8 – -15913
-159 – -150.118
-150.1 – -141.332
-141.3 – -132.491
-132.4 – -123.6163
-123.6 – -114.81447
-114.8 – -105.95659
-105.9 – -97.083904
-97.08 – -88.23360
-88.23 – -79.39559
-79.39 – -70.55962
-70.55 – -61.7409
-61.7 – -52.8676
-52.86 – -44.0252
-44.02 – -35.1713
-35.17 – -26.330
-26.33 – -17.4893
-17.48 – -8.642160
-8.642 – 0.20191918
0.2019 – 9.045932
9.045 – 17.89817
17.89 – 26.73253
26.73 – 35.58447
35.58 – 44.42311
44.42 – 53.2671
53.26 – 62.1190
62.11 – 70.95315
70.95 – 79.79306
79.79 – 88.64107
88.64 – 97.4878
97.48 – 106.3553
106.3 – 115.21162
115.2 – 124150
124 – 132.9206
132.9 – 141.750
141.7 – 150.5238
150.5 – 159.411
159.4 – 168.24
168.2 – 177.19

early_age_mya numeric

11.6% rows beyond 1.5 IQR
rows22,043
null0 (0.0%)
unique164
min0.012
max538.800
mean154.673
median110.100
std143.088
q163.400
q3201.400
iqr138.000
skew1.131
kurtosis0.077
n_outliers2,549
outlier_rate0.116
zero_rate0.000
Show data table
Histogram bins for early_age_mya (median: 110.1).
bincount
0.0117 – 13.482334
13.48 – 26.951665
26.95 – 40.4285
40.42 – 53.8912
53.89 – 67.362904
67.36 – 80.831302
80.83 – 94.32239
94.3 – 107.8454
107.8 – 121.2796
121.2 – 134.71645
134.7 – 148.2410
148.2 – 161.61650
161.6 – 175.1421
175.1 – 188.636
188.6 – 202.1975
202.1 – 215.5193
215.5 – 229530
229 – 242.5216
242.5 – 255.961
255.9 – 269.40
269.4 – 282.90
282.9 – 296.30
296.3 – 309.825
309.8 – 323.317
323.3 – 336.821
336.8 – 350.237
350.2 – 363.758
363.7 – 377.2770
377.2 – 390.6319
390.6 – 404.1319
404.1 – 417.6176
417.6 – 431602
431 – 444.5265
444.5 – 458382
458 – 471.5427
471.5 – 484.981
484.9 – 498.4334
498.4 – 511.9180
511.9 – 525.362
525.3 – 538.840

late_age_mya numeric

11.5% rows beyond 1.5 IQR
rows22,043
null0 (0.0%)
unique156
min0.000
max521.000
mean147.523
median93.900
std141.724
q160.900
q3192.900
iqr132.000
skew1.169
kurtosis0.123
n_outliers2,535
outlier_rate0.115
zero_rate1.13e-03
Show data table
Histogram bins for late_age_mya (median: 93.9).
bincount
0 – 13.032732
13.03 – 26.051291
26.05 – 39.0869
39.08 – 52.17
52.1 – 65.122911
65.12 – 78.153279
78.15 – 91.17374
91.17 – 104.2949
104.2 – 117.2922
117.2 – 130.21043
130.2 – 143.31317
143.3 – 156.3671
156.3 – 169.3368
169.3 – 182.3135
182.3 – 195.4634
195.4 – 208.41003
208.4 – 221.435
221.4 – 234.5123
234.5 – 247.563
247.5 – 260.52
260.5 – 273.50
273.5 – 286.60
286.6 – 299.612
299.6 – 312.634
312.6 – 325.619
325.6 – 338.74
338.7 – 351.783
351.7 – 364.7133
364.7 – 377.7828
377.7 – 390.8467
390.8 – 403.8190
403.8 – 416.8530
416.8 – 429.8170
429.8 – 442.9141
442.9 – 455.9529
455.9 – 468.9299
468.9 – 481.9111
481.9 – 494.9310
494.9 – 508191
508 – 52164

period categorical

rows22,043
null0 (0.0%)
unique298
top_valueIrvingtonian
top_rate0.078
cardinality298
entropy6.422
entropy_ratio0.781
Show data table
Top values for period (20 unique shown, of 298 total).
valuecountshare
Irvingtonian17237.8%
Late Campanian10884.9%
Torrejonian9354.2%
Tiffanian9234.2%
Puercan7783.5%
Kimmeridgian6362.9%
Hettangian6072.8%
Aptian6002.7%
Harrisonian5922.7%
Late Maastrichtian5442.5%
Norian5162.3%
Lochkovian4602.1%
Early Barremian4492.0%
Hemingfordian4412.0%
Tithonian4081.9%
Middle Campanian3591.6%
Early Famennian3461.6%
Early Albian3271.5%
Lancian3201.5%
Maastrichtian3141.4%
Top values (rank 1–20)
  1. Irvingtonian — 1,723
  2. Late Campanian — 1,088
  3. Torrejonian — 935
  4. Tiffanian — 923
  5. Puercan — 778
  6. Kimmeridgian — 636
  7. Hettangian — 607
  8. Aptian — 600
  9. Harrisonian — 592
  10. Late Maastrichtian — 544
  11. Norian — 516
  12. Lochkovian — 460
  13. Early Barremian — 449
  14. Hemingfordian — 441
  15. Tithonian — 408
  16. Middle Campanian — 359
  17. Early Famennian — 346
  18. Early Albian — 327
  19. Lancian — 320
  20. Maastrichtian — 314

late_interval categorical

rows22,043
null0 (0.0%)
unique138
top_value
top_rate0.831
cardinality138
entropy1.591
entropy_ratio0.224
Show data table
Top values for late_interval (20 unique shown, of 138 total).
valuecountshare
1831983.1%
Tithonian5482.5%
Sinemurian4302.0%
Late Campanian1830.8%
Early Cenomanian1320.6%
Albian1290.6%
Rhaetian1190.5%
Early Maastrichtian1110.5%
Early Tithonian1020.5%
Late Turonian920.4%
Maastrichtian720.3%
Harnagian620.3%
Santonian610.3%
Early Aptian570.3%
Tiffanian570.3%
Early Albian560.3%
Barremian540.2%
Pliensbachian500.2%
Toarcian500.2%
Cenomanian450.2%
Top values (rank 1–20)
  1. — 18,319
  2. Tithonian — 548
  3. Sinemurian — 430
  4. Late Campanian — 183
  5. Early Cenomanian — 132
  6. Albian — 129
  7. Rhaetian — 119
  8. Early Maastrichtian — 111
  9. Early Tithonian — 102
  10. Late Turonian — 92
  11. Maastrichtian — 72
  12. Harnagian — 62
  13. Santonian — 61
  14. Early Aptian — 57
  15. Tiffanian — 57
  16. Early Albian — 56
  17. Barremian — 54
  18. Pliensbachian — 50
  19. Toarcian — 50
  20. Cenomanian — 45

phylum categorical

rows22,043
null0 (0.0%)
unique4
top_valueChordata
top_rate0.816
cardinality4
entropy0.887
entropy_ratio0.444
Show data table
Top values for phylum (4 unique shown, of 4 total).
valuecountshare
Chordata1799381.6%
Mollusca20009.1%
Arthropoda20009.1%
500.2%
Top values (rank 1–20)
  1. Chordata — 17,993
  2. Mollusca — 2,000
  3. Arthropoda — 2,000
  4. — 50

class categorical

rows22,043
null0 (0.0%)
unique19
top_valueMammalia
top_rate0.318
cardinality19
entropy2.579
entropy_ratio0.607
Show data table
Top values for class (19 unique shown, of 19 total).
valuecountshare
Mammalia701531.8%
Saurischia550725.0%
Ornithischia281112.8%
Cephalopoda20009.1%
Trilobita20009.1%
Conodonta18838.5%
Reptilia5682.6%
Aves920.4%
600.3%
NO_CLASS_SPECIFIED260.1%
Pteraspidomorpha240.1%
Placodermi170.1%
Acanthodii150.1%
Osteichthyes110.0%
Thelodonti40.0%
Osteostraci40.0%
Chondrichthyes40.0%
Actinopterygii10.0%
Galeaspidomorphi10.0%
Top values (rank 1–20)
  1. Mammalia — 7,015
  2. Saurischia — 5,507
  3. Ornithischia — 2,811
  4. Cephalopoda — 2,000
  5. Trilobita — 2,000
  6. Conodonta — 1,883
  7. Reptilia — 568
  8. Aves — 92
  9. — 60
  10. NO_CLASS_SPECIFIED — 26
  11. Pteraspidomorpha — 24
  12. Placodermi — 17
  13. Acanthodii — 15
  14. Osteichthyes — 11
  15. Thelodonti — 4
  16. Osteostraci — 4
  17. Chondrichthyes — 4
  18. Actinopterygii — 1
  19. Galeaspidomorphi — 1

order categorical

rows22,043
null0 (0.0%)
unique99
top_valueNO_ORDER_SPECIFIED
top_rate0.323
cardinality99
entropy3.887
entropy_ratio0.586
Show data table
Top values for order (20 unique shown, of 99 total).
valuecountshare
NO_ORDER_SPECIFIED711732.3%
301913.7%
Ammonitida15727.1%
Ozarkodinida13416.1%
Rodentia11095.0%
Artiodactyla9514.3%
Carnivora7443.4%
Multituberculata5532.5%
Perissodactyla5172.3%
Phacopida5072.3%
Procreodi5032.3%
Prioniodontida3481.6%
Primates3151.4%
Asaphida3041.4%
Corynexochida2521.1%
Ammonoidea2461.1%
Ptychopariida2381.1%
Proetida2191.0%
Cimolesta2181.0%
Lagomorpha1870.8%
Top values (rank 1–20)
  1. NO_ORDER_SPECIFIED — 7,117
  2. — 3,019
  3. Ammonitida — 1,572
  4. Ozarkodinida — 1,341
  5. Rodentia — 1,109
  6. Artiodactyla — 951
  7. Carnivora — 744
  8. Multituberculata — 553
  9. Perissodactyla — 517
  10. Phacopida — 507
  11. Procreodi — 503
  12. Prioniodontida — 348
  13. Primates — 315
  14. Asaphida — 304
  15. Corynexochida — 252
  16. Ammonoidea — 246
  17. Ptychopariida — 238
  18. Proetida — 219
  19. Cimolesta — 218
  20. Lagomorpha — 187

family categorical

rows22,043
null0 (0.0%)
unique528
top_value
top_rate0.155
cardinality528
entropy6.566
entropy_ratio0.726
Show data table
Top values for family (20 unique shown, of 528 total).
valuecountshare
341815.5%
NO_FAMILY_SPECIFIED19969.1%
Hadrosauridae6893.1%
Grallatoridae5932.7%
Palmatolepidae5862.7%
Arctocyonidae5032.3%
Polygnathidae4592.1%
Cricetidae4071.8%
Equidae3601.6%
Canidae3581.6%
Ceratopsidae3361.5%
Dromaeosauridae3351.5%
Icriodontidae2721.2%
Periptychidae2491.1%
Neoplagiaulacidae2341.1%
Merycoidodontidae2311.0%
Camelidae2161.0%
Tyrannosauridae1840.8%
Diplodocidae1810.8%
Asaphidae1720.8%
Top values (rank 1–20)
  1. — 3,418
  2. NO_FAMILY_SPECIFIED — 1,996
  3. Hadrosauridae — 689
  4. Grallatoridae — 593
  5. Palmatolepidae — 586
  6. Arctocyonidae — 503
  7. Polygnathidae — 459
  8. Cricetidae — 407
  9. Equidae — 360
  10. Canidae — 358
  11. Ceratopsidae — 336
  12. Dromaeosauridae — 335
  13. Icriodontidae — 272
  14. Periptychidae — 249
  15. Neoplagiaulacidae — 234
  16. Merycoidodontidae — 231
  17. Camelidae — 216
  18. Tyrannosauridae — 184
  19. Diplodocidae — 181
  20. Asaphidae — 172

genus text

98.9% rows are a single word 95th-percentile length under 20 chars 88.2% duplicate strings
rows22,043
null0 (0.0%)
unique2,608
len_min0
len_max33
len_mean8.188
len_median10.000
len_p9515.000
word_mean1.011
word_median1.000
n_empty5,545
n_duplicates19,435
duplicate_rate0.882
vocab_size2,525
readability_flesch_mean-4.827
emoji_rate0.000
url_rate0.000
one_word_rate0.989
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for genus (mean: 8.188449848024316).
charscount
0 – 15545
1 – 20
2 – 20
2 – 313
3 – 431
4 – 50
5 – 6372
6 – 7188
7 – 7704
7 – 81676
8 – 92307
9 – 100
10 – 112405
11 – 122504
12 – 122230
12 – 131696
13 – 141017
14 – 150
15 – 16522
16 – 16324
16 – 17191
17 – 1850
18 – 190
19 – 2051
20 – 2121
21 – 2119
21 – 2236
22 – 2320
23 – 240
24 – 255
25 – 267
26 – 263
26 – 274
27 – 2840
28 – 290
29 – 3057
30 – 314
31 – 310
31 – 320
32 – 331
Sample values (first 10)
  1. Microtus
  2. Acanthohoplites
  3. Synphoroides
  4. Hadrosauropodus
  5. Hemibaculites
  6. Hoploscaphites
  7. Palmatolepis

country categorical

rows22,043
null0 (0.0%)
unique93
top_valueUS
top_rate0.509
cardinality93
entropy3.293
entropy_ratio0.504
Show data table
Top values for country (20 unique shown, of 93 total).
valuecountshare
US1121850.9%
CA18308.3%
CN16617.5%
UK9834.5%
ES8413.8%
FR3901.8%
MA3031.4%
AR2921.3%
CZ2881.3%
AU2471.1%
TZ2181.0%
UZ1840.8%
MX1750.8%
KR1750.8%
SE1700.8%
CH1660.8%
MN1620.7%
ZA1590.7%
RU1560.7%
DE1520.7%
Top values (rank 1–20)
  1. US — 11,218
  2. CA — 1,830
  3. CN — 1,661
  4. UK — 983
  5. ES — 841
  6. FR — 390
  7. MA — 303
  8. AR — 292
  9. CZ — 288
  10. AU — 247
  11. TZ — 218
  12. UZ — 184
  13. MX — 175
  14. KR — 175
  15. SE — 170
  16. CH — 166
  17. MN — 162
  18. ZA — 159
  19. RU — 156
  20. DE — 152

state categorical

rows22,043
null0 (0.0%)
unique519
top_valueWyoming
top_rate0.086
cardinality519
entropy6.288
entropy_ratio0.697
Show data table
Top values for state (20 unique shown, of 519 total).
valuecountshare
Wyoming19038.6%
Montana13946.3%
10824.9%
New Mexico10484.8%
Alberta10094.6%
Nebraska9504.3%
England9074.1%
Guangxi8613.9%
California8373.8%
Colorado5402.4%
Texas5302.4%
Utah4892.2%
Nevada3611.6%
Murcia3331.5%
North Dakota3251.5%
South Dakota3161.4%
Massachusetts2781.3%
Kansas2731.2%
Northwest Territories2461.1%
Arizona2261.0%
Top values (rank 1–20)
  1. Wyoming — 1,903
  2. Montana — 1,394
  3. — 1,082
  4. New Mexico — 1,048
  5. Alberta — 1,009
  6. Nebraska — 950
  7. England — 907
  8. Guangxi — 861
  9. California — 837
  10. Colorado — 540
  11. Texas — 530
  12. Utah — 489
  13. Nevada — 361
  14. Murcia — 333
  15. North Dakota — 325
  16. South Dakota — 316
  17. Massachusetts — 278
  18. Kansas — 273
  19. Northwest Territories — 246
  20. Arizona — 226

formation categorical

top value is 100.0% of rows
rows22,043
null0 (0.0%)
unique1
top_value
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Show data table
Top values for formation (1 unique shown, of 1 total).
valuecountshare
22043100.0%
Top values (rank 1–20)
  1. — 22,043

collection categorical

top value is 100.0% of rows
rows22,043
null0 (0.0%)
unique1
top_value
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Show data table
Top values for collection (1 unique shown, of 1 total).
valuecountshare
22043100.0%
Top values (rank 1–20)
  1. — 22,043

paleolat numeric

7.0% rows beyond 1.5 IQR
rows22,043
null491 (2.2%)
unique3,214
min-86.160
max89.200
mean26.460
median34.890
std29.543
q116.340
q346.980
iqr30.640
skew-1.080
kurtosis0.265
n_outliers1,503
outlier_rate0.070
zero_rate0.000
Show data table
Histogram bins for paleolat (median: 34.89).
bincount
-86.16 – -81.7813
-81.78 – -77.3917
-77.39 – -73.0117
-73.01 – -68.6211
-68.62 – -64.2412
-64.24 – -59.8619
-59.86 – -55.4713
-55.47 – -51.0987
-51.09 – -46.782
-46.7 – -42.32185
-42.32 – -37.94454
-37.94 – -33.55285
-33.55 – -29.17350
-29.17 – -24.78508
-24.78 – -20.4462
-20.4 – -16.02669
-16.02 – -11.63554
-11.63 – -7.248173
-7.248 – -2.864144
-2.864 – 1.52184
1.52 – 5.904345
5.904 – 10.29212
10.29 – 14.67504
14.67 – 19.06275
19.06 – 23.44888
23.44 – 27.821581
27.82 – 32.211491
32.21 – 36.592167
36.59 – 40.981711
40.98 – 45.36787
45.36 – 49.742851
49.74 – 54.131762
54.13 – 58.511421
58.51 – 62.91132
62.9 – 67.28152
67.28 – 71.668
71.66 – 76.054
76.05 – 80.436
80.43 – 84.823
84.82 – 89.213

paleolng numeric

rows22,043
null491 (2.2%)
unique3,715
min-177.600
max168.700
mean-28.577
median-62.150
std68.911
q1-77.160
q320.190
iqr97.350
skew0.737
kurtosis-0.481
n_outliers3
outlier_rate1.39e-04
zero_rate0.000
Show data table
Histogram bins for paleolng (median: -62.150000000000006).
bincount
-177.6 – -168.93
-168.9 – -160.312
-160.3 – -151.64
-151.6 – -14331
-143 – -134.331
-134.3 – -125.7121
-125.7 – -117345
-117 – -108.3968
-108.3 – -99.68726
-99.68 – -91.031791
-91.03 – -82.37244
-82.37 – -73.712539
-73.71 – -65.053233
-65.05 – -56.4949
-56.4 – -47.74404
-47.74 – -39.081084
-39.08 – -30.42478
-30.42 – -21.77154
-21.77 – -13.1157
-13.11 – -4.45626
-4.45 – 4.207210
4.207 – 12.861004
12.86 – 21.521648
21.52 – 30.181074
30.18 – 38.84534
38.84 – 47.49129
47.49 – 56.1552
56.15 – 64.8182
64.81 – 73.47284
73.47 – 82.12257
82.12 – 90.78663
90.78 – 99.44486
99.44 – 108.1156
108.1 – 116.8315
116.8 – 125.4434
125.4 – 134.1182
134.1 – 142.7192
142.7 – 151.440
151.4 – 1607
160 – 168.73

reference_no text

100.0% rows are a single word 89.8% rows are all-caps 95th-percentile length under 20 chars 83.1% duplicate strings
rows22,043
null0 (0.0%)
unique3,725
len_min1
len_max5
len_mean4.172
len_median4.000
len_p955.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates18,318
duplicate_rate0.831
vocab_size3,547
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.898
boilerplate_rate0.000
Show data table
Character-length distribution for reference_no (mean: 4.172208864492129).
charscount
1 – 123
1 – 10
1 – 10
1 – 10
1 – 20
2 – 20
2 – 20
2 – 20
2 – 20
2 – 20
2 – 22233
2 – 20
2 – 20
2 – 20
2 – 20
2 – 30
3 – 30
3 – 30
3 – 30
3 – 30
3 – 31274
3 – 30
3 – 30
3 – 30
3 – 40
4 – 40
4 – 40
4 – 40
4 – 40
4 – 40
4 – 48908
4 – 40
4 – 40
4 – 40
4 – 40
4 – 50
5 – 50
5 – 50
5 – 50
5 – 59605
Sample values (first 10)
  1. 8880
  2. 3211
  3. 57
  4. 41
  5. 13037
  6. 36816
  7. 14666
  8. 70
  9. 45
  10. 4233

occurrence_no text

100.0% of rows are unique strings 100.0% rows are a single word 99.9% rows are all-caps 95th-percentile length under 20 chars
rows22,043
null0 (0.0%)
unique22,043
len_min1
len_max7
len_mean5.762
len_median6.000
len_p956.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size20,000
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.999
boilerplate_rate0.000
Show data table
Character-length distribution for occurrence_no (mean: 5.761783786235993).
charscount
1 – 18
1 – 10
1 – 10
1 – 20
2 – 20
2 – 20
2 – 215
2 – 20
2 – 20
2 – 20
2 – 30
3 – 30
3 – 30
3 – 326
3 – 30
3 – 30
3 – 40
4 – 40
4 – 40
4 – 40
4 – 41359
4 – 40
4 – 40
4 – 50
5 – 50
5 – 50
5 – 53185
5 – 50
5 – 50
5 – 60
6 – 60
6 – 60
6 – 60
6 – 616620
6 – 60
6 – 60
6 – 70
7 – 70
7 – 70
7 – 7830
Sample values (first 10)
  1. 361526
  2. 196124
  3. 27440
  4. 23237
  5. 519811
  6. 10365
  7. 533398
  8. 31658
  9. 23849
  10. 142746