saturn

/home/coolhand/html/datavis/data_trove/data/quirky/bioluminescence.json 43,060 rows sample n=43,060 seed 42 2026-05-01T23:15:46+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/quirky/bioluminescence.json
Total rows43,060
Profiled sample43,060
Columns14
Generated2026-05-01T23:15:46+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This dataset catalogues 43,060 records of bioluminescent marine organisms, with taxonomic fields (phylum, class, order, family, genus, scientificName), a bioluminescence_group label, geographic coordinates, depth, country, source dataset, and date/year. The taxonomy is dominated by Arthropoda (12,297) and Cnidaria (8,874) within 7 phyla, while bioluminescence_group is fairly evenly distributed across 26 categories led by Dinoflagellate (4,000). Two things deserve a closer look first: the depth column is highly skewed (skew 4.72, max 10,000m vs median 52.5m) with a 24.75% null rate and ~10.6% outliers, and the country field is 63.7% empty, limiting any geographic breakdown by nation. The year field is also 42% null, so temporal analysis will be partial.

scientificName high anthropic:claude-opus-4-7

Taxonomic species/genus identifiers (Latin binomials like 'Mnemiopsis leidyi' and genera like 'Lingulodinium', 'Vibrio'). With 245 unique values across 43,060 rows and entropy ratio 0.747, the distribution is moderately spread — the top species accounts for only 4.6% of rows. The mix of binomial species names and bare genus names suggests inconsistent taxonomic resolution across records.

genus high anthropic:claude-opus-4-7

Categorical genus label across 27 bioluminescent marine genera (Noctiluca, Pyrocystis, Alexandrium, Aequorea, etc.). Distribution is essentially uniform — the top 10 genera each show exactly 2000 rows and the top rate is 4.64%, giving an entropy ratio of 0.959, which signals a deliberately balanced sample rather than natural abundance. No nulls across 43,060 rows.

family high anthropic:claude-opus-4-7

Taxonomic family labels for what appears to be a catalogue of bioluminescent marine organisms, spanning 22 distinct families across 43,060 complete rows. The distribution is highly engineered rather than natural: four families (Pyrocystaceae, Euphausiidae, Cypridinidae, Vibrionaceae) each hit exactly 4,000 rows and several others land at exactly 2,000, suggesting deliberate per-class sampling or quota balancing. Entropy ratio of 0.93 confirms the near-uniform spread, and no nulls are present.

phylum high anthropic:claude-opus-4-7

Taxonomic phylum label across 43060 records spanning just 7 distinct values with no nulls. Arthropoda leads at 28.6% (12297 rows), followed by Cnidaria (8874) and Myzozoa (8000), with entropy ratio 0.92 indicating a fairly even spread across the seven categories. The mix of animal phyla alongside Proteobacteria (a bacterial phylum) is notable — this column blends kingdoms.

class high anthropic:claude-opus-4-7

Taxonomic class labels for marine organisms, spanning 13 distinct values across 43,060 rows with no nulls. Distribution is fairly balanced (entropy ratio 0.93) with Dinophyceae leading at only 18.6% — several classes show suspiciously round counts (8000, 6000, 4000, 2000) suggesting curated/sampled rather than naturally observed frequencies.

order high anthropic:claude-opus-4-7

Taxonomic order names for marine organisms (Gonyaulacales, Euphausiacea, Calanoida, etc.), with 17 distinct values across 43,060 rows and no nulls. Distribution is fairly even — entropy ratio 0.949 and top class only 13.9% — though several categories sit at suspiciously round counts (4000, 2000), suggesting stratified sampling or quota construction rather than natural frequencies.

latitude high anthropic:claude-opus-4-7

Numeric column bounded between -76.619 and 88.29, consistent with WGS84 latitudes in degrees. The distribution is wide (std 40.27, IQR 69.61) and mildly left-skewed (-0.66) with a flat shape (kurtosis -0.94), indicating coverage across both hemispheres rather than a single region. No nulls and no outliers flagged across 43,060 rows with 14,146 distinct values.

longitude high anthropic:claude-opus-4-7

Geographic longitude in decimal degrees, spanning the full -179.9987 to 179.99 range with 14,637 distinct values across 43,060 rows and zero nulls. The distribution is nearly symmetric (skew 0.14) with light tails (kurtosis -0.65) and a wide IQR of 124.12, indicating truly global coverage rather than a regional sample. No outliers flagged and only a 0.11% zero rate, consistent with clean coordinate data.

depth high anthropic:claude-opus-4-7

This is a numeric depth measurement (likely meters), with 24.75% nulls across 43,060 rows and only 3,283 distinct values. The distribution is heavily right-skewed (skew 4.72, kurtosis 35.89): the median is 52.5 but the mean is 281.2 and the max reaches 10,000, while 10.63% of values are flagged as outliers. Notably, the minimum is -53.0 (negative depths are suspect) and 11.92% of values are exactly zero.

date high anthropic:claude-opus-4-7

This is a date column stored as free text rather than a parsed timestamp, with values mixing single dates (e.g. '2017-05-30'), single months ('2013-08'), month ranges ('2010-05/2010-06') and even multi-year spans ('1962/1964'). The format heterogeneity is the main surprise: 97% of entries are one 'word', but length varies from 4 to 51 characters, and 67% are duplicates of another row. Nulls are also non-trivial at 12%.

year high anthropic:claude-opus-4-7

This is a year column stored categorically across 137 distinct values, suggesting coverage spanning over a century. The most common year is '2000' at 5.17% of non-null rows, with a high entropy ratio of 0.865 indicating values are spread fairly evenly across years. Notably, 42.18% of rows are null, which triggered a null_rate alert and limits usefulness without imputation or filtering.

country high anthropic:claude-opus-4-7

Country of origin as a categorical label across 130 distinct values, but 63.7% of the 43,060 rows are empty strings rather than nulls, making the modal 'value' effectively missing. The remaining entries show inconsistent encoding — full names ('Australia', 'United States'), ISO codes ('GB'), uppercase forms ('PERU', 'SOVIET UNION'), and a defunct state — suggesting data was merged from heterogeneous sources without normalisation. Entropy ratio of 0.37 confirms the distribution is heavily concentrated in a few buckets.

dataset high anthropic:claude-opus-4-7

This column names the source dataset each record was drawn from, with 214 distinct provenance strings across 43,060 rows. The dominant value is an empty string covering 61.1% of rows (26,317), meaning provenance is missing for the majority; named sources like 'Environmental Monitoring database (MOD) DNV' (1,760) and 'Jellyfish sightings along the Italian coastline from 2009 to 2017' (1,024) trail far behind. Entropy ratio of 0.41 confirms the distribution is heavily concentrated on that blank.

bioluminescence_group high anthropic:claude-opus-4-7

Categorical taxonomy label grouping records by bioluminescent organism type, with 26 distinct groups across 43,060 rows and no nulls. Distribution is remarkably flat (entropy ratio 0.95): 'Dinoflagellate' leads at only 9.3%, and the next nine values are tied at exactly 2,000 rows each, suggesting a synthetic or quota-balanced sample rather than naturally observed frequencies.

Numeric correlation

scientificName categorical

rows43,060
null0 (0.0%)
unique245
top_valueMnemiopsis leidyi
top_rate0.046
cardinality245
entropy5.928
entropy_ratio0.747
Top values (rank 1–20)
  1. Mnemiopsis leidyi — 2,000
  2. Lingulodinium — 1,976
  3. Meganyctiphanes norvegica — 1,928
  4. Photobacterium — 1,842
  5. Periphylla periphylla — 1,802
  6. Pelagia noctiluca — 1,768
  7. Noctiluca scintillans — 1,728
  8. Vibrio — 1,584
  9. Vargula norvegica — 1,482
  10. Cypridina dentata — 1,320
  11. Euphausia superba — 1,298
  12. Chaetopterus variopedatus — 1,222
  13. Beroe — 1,202
  14. Oplophorus spinosus — 1,170
  15. Histioteuthis — 952
  16. Alexandrium — 944
  17. Metridia lucens — 872
  18. Aequorea — 798
  19. Atolla wyvillei — 756
  20. Pyrocystis pseudonoctiluca — 742

genus categorical

rows43,060
null0 (0.0%)
unique27
top_valueNoctiluca
top_rate0.046
cardinality27
entropy4.558
entropy_ratio0.959
Top values (rank 1–20)
  1. Noctiluca — 2,000
  2. Pyrocystis — 2,000
  3. Lingulodinium — 2,000
  4. Alexandrium — 2,000
  5. Aequorea — 2,000
  6. Pelagia — 2,000
  7. Mnemiopsis — 2,000
  8. Atolla — 2,000
  9. Periphylla — 2,000
  10. Beroe — 2,000
  11. Euphausia — 2,000
  12. Meganyctiphanes — 2,000
  13. Metridia — 2,000
  14. Oplophorus — 2,000
  15. Vargula — 2,000
  16. Cypridina — 2,000
  17. Histioteuthis — 2,000
  18. Vibrio — 2,000
  19. Photobacterium — 2,000
  20. Chaetopterus — 2,000

family categorical

rows43,060
null0 (0.0%)
unique22
top_valuePyrocystaceae
top_rate0.093
cardinality22
entropy4.157
entropy_ratio0.932
Top values (rank 1–20)
  1. Pyrocystaceae — 4,000
  2. Euphausiidae — 4,000
  3. Cypridinidae — 4,000
  4. Vibrionaceae — 4,000
  5. Metridinidae — 2,297
  6. Noctilucaceae — 2,000
  7. Lingulodiniaceae — 2,000
  8. Aequoreidae — 2,000
  9. Pelagiidae — 2,000
  10. Bolinopsidae — 2,000
  11. Atollidae — 2,000
  12. Periphyllidae — 2,000
  13. Beroidae — 2,000
  14. Oplophoridae — 2,000
  15. Histioteuthidae — 2,000
  16. Chaetopteridae — 2,000
  17. Pholadidae — 928
  18. Renillidae — 874
  19. Vampyroteuthidae — 484
  20. Thysanoteuthidae — 209

phylum categorical

rows43,060
null0 (0.0%)
unique7
top_valueArthropoda
top_rate0.286
cardinality7
entropy2.593
entropy_ratio0.923
Top values (rank 1–20)
  1. Arthropoda — 12,297
  2. Cnidaria — 8,874
  3. Myzozoa — 8,000
  4. Ctenophora — 4,168
  5. Proteobacteria — 4,000
  6. Mollusca — 3,721
  7. Annelida — 2,000

class categorical

rows43,060
null0 (0.0%)
unique13
top_valueDinophyceae
top_rate0.186
cardinality13
entropy3.430
entropy_ratio0.927
Top values (rank 1–20)
  1. Dinophyceae — 8,000
  2. Scyphozoa — 6,000
  3. Malacostraca — 6,000
  4. Ostracoda — 4,000
  5. Gammaproteobacteria — 4,000
  6. Cephalopoda — 2,793
  7. Copepoda — 2,297
  8. Tentaculata — 2,168
  9. Hydrozoa — 2,000
  10. Nuda — 2,000
  11. Polychaeta — 2,000
  12. Bivalvia — 928
  13. Octocorallia — 874

order categorical

rows43,060
null0 (0.0%)
unique17
top_valueGonyaulacales
top_rate0.139
cardinality17
entropy3.879
entropy_ratio0.949
Top values (rank 1–20)
  1. Gonyaulacales — 6,000
  2. Coronatae — 4,000
  3. Euphausiacea — 4,000
  4. Myodocopida — 4,000
  5. Vibrionales — 4,000
  6. Oegopsida — 2,309
  7. Calanoida — 2,297
  8. Lobata — 2,168
  9. Noctilucales — 2,000
  10. Leptothecata — 2,000
  11. Semaeostomeae — 2,000
  12. Beroida — 2,000
  13. Decapoda — 2,000
  14. — 2,000
  15. Myida — 928
  16. Scleralcyonacea — 874
  17. Vampyromorpha — 484

latitude numeric

rows43,060
null0 (0.0%)
unique14,146
min-76.619
max88.290
mean19.105
median36.710
std40.266
q1-19.308
q350.303
iqr69.612
skew-0.661
kurtosis-0.936
n_outliers0
outlier_rate0.000
zero_rate4.64e-04

longitude numeric

rows43,060
null0 (0.0%)
unique14,637
min-179.999
max179.990
mean9.640
median3.057
std88.609
q1-60.186
q363.933
iqr124.119
skew0.138
kurtosis-0.646
n_outliers0
outlier_rate0.000
zero_rate1.11e-03

depth numeric

24.8% null skew=+4.72 10.6% rows beyond 1.5 IQR
rows43,060
null10,658 (24.8%)
unique3,283
min-53.000
max10,000
mean281.209
median52.500
std570.178
q17.500
q3321.000
iqr313.500
skew4.724
kurtosis35.887
n_outliers3,444
outlier_rate0.106
zero_rate0.119

date text

97.0% rows are a single word 100.0% rows are all-caps 67.4% duplicate strings
rows43,060
null5,182 (12.0%)
unique12,338
len_min4
len_max51
len_mean16.448
len_median19.000
len_p9539.000
word_mean1.030
word_median1.000
n_empty0
n_duplicates25,540
duplicate_rate0.674
vocab_size10,135
readability_flesch_mean121.200
emoji_rate0.000
url_rate0.000
one_word_rate0.970
allcaps_rate1.000
boilerplate_rate0.000
Sample values (first 10)
  1. 2016-05-20
  2. 1996-07-04
  3. 2016-09-20T10:18:00Z
  4. 2015-01-18T14:35:00Z
  5. 2017-08-19
  6. 2017-04-29T00:01:00
  7. 2018-08-30
  8. 2020-11-05T16:06:00Z
  9. 2013-08-05T14:15:00Z
  10. 2012-03-24

year categorical

42.2% null
rows43,060
null18,164 (42.2%)
unique137
top_value2000
top_rate0.052
cardinality137
entropy6.142
entropy_ratio0.865
Top values (rank 1–20)
  1. 2000 — 1,287
  2. 2001 — 703
  3. 2016 — 691
  4. 2008 — 688
  5. 2010 — 651
  6. 2002 — 579
  7. 2013 — 556
  8. 2011 — 554
  9. 1979 — 524
  10. 2014 — 519
  11. 2003 — 514
  12. 2004 — 511
  13. 2015 — 504
  14. 2012 — 493
  15. 2007 — 459
  16. 2006 — 442
  17. 2005 — 438
  18. 1998 — 437
  19. 2020 — 437
  20. 2019 — 436

country categorical

rows43,060
null0 (0.0%)
unique130
top_value
top_rate0.637
cardinality130
entropy2.569
entropy_ratio0.366
Top values (rank 1–20)
  1. — 27,422
  2. Australia — 4,573
  3. United States — 1,416
  4. PERU — 1,098
  5. Canada — 976
  6. SOVIET UNION — 634
  7. Israel — 550
  8. GB — 465
  9. Spain — 370
  10. Sweden — 340
  11. USA — 323
  12. Ukraine — 316
  13. Romania — 310
  14. Antarctica — 242
  15. Republic of Korea — 225
  16. Colombia — 214
  17. Italy — 213
  18. New Zealand — 212
  19. FR — 210
  20. Brazil — 179

dataset categorical

rows43,060
null0 (0.0%)
unique214
top_value
top_rate0.611
cardinality214
entropy3.190
entropy_ratio0.412
Top values (rank 1–20)
  1. — 26,317
  2. Environmental Monitoring database (MOD) DNV — 1,760
  3. Jellyfish sightings along the Italian coastline from 2009 to 2017 — 1,024
  4. QUADRIGE - Coastal monitoring database and products, 1974 onwards. (6064) — 978
  5. MBIS research trawl surveys — 714
  6. Groundfish Survey Invertebrate Data — 674
  7. DFO Quebec Region Ecosystemic bottom trawl surveys — 650
  8. Marine Recorder Snapshot extract of surveys entered by SeaSearch — 643
  9. CPR — 604
  10. DATRAS: ICES Database of trawl surveys — 591
  11. Citizen Science based jellyfish observations along the Israeli Mediterranean coast in 2011-2025 — 546
  12. BioChem: Sameoto zooplankton collection — 516
  13. Marine Recorder Snapshot extract of surveys entered by JNCC — 396
  14. Atlantic Reference Centre — 383
  15. DFO Central and Arctic Multi-species Stock Assessment Surveys — 364
  16. MEDITS-Spain: Demersal and mega-benthic species from the MEDITS (Mediterranean International Trawl Survey) project on the Spanish continental shelf between 1994 and 2010 — 277
  17. NIWA Invertebrate Collection — 267
  18. ANEMOON Beach washup monitoring (SMP) data along the Dutch coastline collected through citizen science — 240
  19. Phytoplankton abundance and composition in the Ebro delta embayments (Alfacs Bay and Fangar Bay, North Western Mediterranean) during 1990-2019 — 198
  20. Romanian Black Sea Zooplankton data from 1981 to 2000 — 196

bioluminescence_group categorical

rows43,060
null0 (0.0%)
unique26
top_valueDinoflagellate
top_rate0.093
cardinality26
entropy4.465
entropy_ratio0.950
Top values (rank 1–20)
  1. Dinoflagellate — 4,000
  2. Sea sparkle dinoflagellate — 2,000
  3. Bioluminescent dinoflagellate — 2,000
  4. Crystal jelly (source of GFP) — 2,000
  5. Mauve stinger jellyfish — 2,000
  6. Warty comb jelly — 2,000
  7. Crown jellyfish (alarm jelly) — 2,000
  8. Helmet jellyfish — 2,000
  9. Comb jelly — 2,000
  10. Krill (many species bioluminescent) — 2,000
  11. Northern krill — 2,000
  12. Copepod (secretes luminous fluid) — 2,000
  13. Deep-sea shrimp (NanoLuc source) — 2,000
  14. Sea firefly ostracod — 2,000
  15. Bioluminescent ostracod — 2,000
  16. Cock-eyed squid — 2,000
  17. Bioluminescent marine bacteria — 2,000
  18. Marine luminous bacteria — 2,000
  19. Parchment tube worm — 2,000
  20. Boring clam (piddock) — 928