saturn·

quirky carnivorous plants real

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/carnivorous_plants_real.json

Saturn profiled 610 rows across 14 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/carnivorous_plants_real.json",
    "--findings", "quirky-carnivorous_plants_real.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset holds 610 GBIF biodiversity occurrence records across 14 columns, mixing taxonomy (family, genus, species), geography (country, stateProvince, latitude/longitude), and observation metadata (basisOfRecord, year, month, coordinateUncertainty). Despite the 'carnivorous_plants' filename, the taxonomy is dominated by two unrelated families — Hesperiidae (skipper butterflies) and Canellaceae — each with 300 records, plus a small Araceae tail; this taxonomic split is the first thing worth investigating. Geographically, records skew to the Americas (USA 130, Mexico 73, Brazil 51) but span 35 countries, and 90% are HUMAN_OBSERVATION rather than preserved specimens. Watch coordinateUncertainty closely: it is highly skewed (skew 17.3) with a max of 766,917 m and 22.6% nulls, so any spatial analysis needs filtering. Years are tightly clustered in 2021–2026, indicating a recent-only snapshot.

citing: row_count · column_count · columns.family.top_values · columns.country.top_values · columns.basisOfRecord.top_values · columns.scientificName.top_values · columns.coordinateUncertainty.stats · columns.year.stats

Out[4]:

saturn.schema() · 14 columns

column kind n null% unique alerts
scientificName categorical 610 0.0% 157 long_tail
species categorical 610 0.0% 123
genus categorical 610 0.0% 94
family categorical 610 0.0% 3
latitude numeric 610 0.0% 466
longitude numeric 610 0.0% 467
country categorical 610 0.0% 35
stateProvince categorical 610 0.0% 108
locality categorical 610 0.0% 29 long_tail
basisOfRecord categorical 610 0.0% 2
year numeric 610 0.0% 6
month numeric 610 0.2% 12
coordinateUncertainty numeric 610 22.6% 151 null_rate high_skew outliers
gbifID categorical 610 0.0% 610 long_tail
Fig 1.
family · Shows the surprising 50/50 split between Hesperiidae and Canellaceae that defines the dataset.
Show data table
Top values for family (3 unique shown, of 3 total).
valuecountshare
Hesperiidae30049.2%
Canellaceae30049.2%
Araceae101.6%
Fig 2.
country · Top countries reveal an Americas-heavy footprint led by the USA, Mexico, and Brazil.
Show data table
Top values for country (20 unique shown, of 35 total).
valuecountshare
United States of America13021.3%
Mexico7312.0%
Brazil518.4%
Guadeloupe487.9%
Australia477.7%
South Africa416.7%
Madagascar406.6%
Puerto Rico376.1%
Dominican Republic162.6%
Panama152.5%
Argentina142.3%
Singapore101.6%
Cayman Islands101.6%
Antigua and Barbuda101.6%
China101.6%
Virgin Islands (U.S.)81.3%
Kenya81.3%
Hong Kong61.0%
Costa Rica50.8%
Sint Maarten (Dutch part)40.7%
Fig 3.
basisOfRecord · Confirms ~90% of records are human observations rather than preserved specimens.
Show data table
Top values for basisOfRecord (2 unique shown, of 2 total).
valuecountshare
HUMAN_OBSERVATION55090.2%
PRESERVED_SPECIMEN609.8%
Fig 4.
coordinateUncertainty · Extreme right skew and huge max value flag spatial-precision outliers to filter before mapping.
Show data table
Histogram bins for coordinateUncertainty (median: 35.0).
bincount
1 – 3.652e+04467
3.652e+04 – 7.304e+042
7.304e+04 – 1.096e+051
1.096e+05 – 1.461e+050
1.461e+05 – 1.826e+050
1.826e+05 – 2.191e+050
2.191e+05 – 2.556e+051
2.556e+05 – 2.922e+050
2.922e+05 – 3.287e+050
3.287e+05 – 3.652e+050
3.652e+05 – 4.017e+050
4.017e+05 – 4.382e+050
4.382e+05 – 4.748e+050
4.748e+05 – 5.113e+050
5.113e+05 – 5.478e+050
5.478e+05 – 5.843e+050
5.843e+05 – 6.208e+050
6.208e+05 – 6.574e+050
6.574e+05 – 6.939e+050
6.939e+05 – 7.304e+050
7.304e+05 – 7.669e+051
Fig 5.
month · Highlights seasonal collection bias, with activity concentrated in the first half of the year.
Show data table
Histogram bins for month (median: 1.0).
bincount
1 – 1.458342
1.458 – 1.9170
1.917 – 2.37518
2.375 – 2.8330
2.833 – 3.29224
3.292 – 3.750
3.75 – 4.20824
4.208 – 4.6670
4.667 – 5.12521
5.125 – 5.5830
5.583 – 6.04217
6.042 – 6.50
6.5 – 6.9580
6.958 – 7.41742
7.417 – 7.8750
7.875 – 8.33323
8.333 – 8.7920
8.792 – 9.2515
9.25 – 9.7080
9.708 – 10.1724
10.17 – 10.620
10.62 – 11.0829
11.08 – 11.540
11.54 – 1230
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
scientificNamecategorical0.0%
speciescategorical0.0%
genuscategorical0.0%
familycategorical0.0%
latitudenumeric0.0%
longitudenumeric0.0%
countrycategorical0.0%
stateProvincecategorical0.0%
localitycategorical0.0%
basisOfRecordcategorical0.0%
yearnumeric0.0%
monthnumeric0.2%
coordinateUncertaintynumeric22.6%
gbifIDcategorical0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 5 numeric columns (values clipped to 2 decimals).
latitudelongitudeyearmonthcoordinateUncertainty
latitude+1.00-0.74+0.14-0.18-0.10
longitude-0.74+1.00+0.00+0.06+0.01
year+0.14+0.00+1.00-0.65-0.17
month-0.18+0.06-0.65+1.00+0.16
coordinateUncertainty-0.10+0.01-0.17+0.16+1.00

scientificName categorical label

Taxonomic binomials with authorship — almost certainly biodiversity occurrence records keyed by Linnaean scientific name. The distribution is heavily concentrated: 157 distinct taxa across 610 rows, with Canella winterana alone claiming 28.5% (174 records) and a long tail flagged by the profiler. Notably the names mix plants (Canella, Warburgia, Cinnamodendron, Pinellia) with butterflies (Hylephila, Ocybadistes, Urbanus), so this column spans multiple kingdoms rather than a single clade.

Treatment: Group rare taxa into an 'other' bucket or join to a taxonomy table before using as a categorical feature.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["scientificName"].stats

statvalue
n610
nulls0 (0.0%)
unique157
top_value Canella winterana (L.) Gaertn.
top_rate 0.2852
cardinality 157
entropy 5.517
entropy_ratio 0.7563
alert: long_tail81 singleton categories
Fig 8.
Top values for scientificName.
Show data table
Top values for scientificName (20 unique shown, of 157 total).
valuecountshare
Canella winterana (L.) Gaertn.17428.5%
Warburgia salutaris (Bertol.fil.) Chiov.355.7%
Cinnamodendron dinisii Schwacke203.3%
Hylephila phyleus (Drury, 1773)183.0%
Cinnamosma Baill.172.8%
Cinnamosma fragrans Baill.142.3%
Ocybadistes walkeri Heron, 1894111.8%
Cinnamodendron occhionianum F.Barros & J.Salazar101.6%
Pinellia fujianensis H.Li & G.H.Zhu101.6%
Urbanus proteus (Linnaeus, 1758)81.3%
Cinnamosma madagascariensis Danguy81.3%
Warburgia ugandensis Sprague81.3%
Lerodea eufala (Edwards, 1869)71.1%
Burnsius albezens Grishin, 202271.1%
Lerema Scudder, 187271.1%
Quasimellana eulogius (Plötz, 1882)61.0%
Spicauda procne (Plötz, 1880)61.0%
Burnsius oileus (Linnaeus, 1767)50.8%
Burnsius orcynoides50.8%
Cephrenes augiades (Felder, 1860)50.8%

species categorical label

Categorical taxonomic labels — mostly Linnaean binomials (e.g. Canella winterana, Warburgia salutaris) with a few family-level names mixed in (Droseraceae, Sarraceniaceae), suggesting inconsistent taxonomic granularity. One species, Canella winterana, dominates at 28.5% of 610 rows, yet 123 distinct values and an entropy ratio of 0.74 indicate a long tail. The mix of plant genera (Cinnamodendron, Cinnamosma) and butterfly/skipper species (Hylephila phyleus, Ocybadistes walkeri, Urbanus dorantes) is unusual for a single 'species' column.

Treatment: Normalise to a consistent taxonomic rank before grouping; consider collapsing rare classes or target-encoding given the 123-way cardinality.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["species"].stats

statvalue
n610
nulls0 (0.0%)
unique123
top_value Canella winterana
top_rate 0.2852
cardinality 123
entropy 5.144
entropy_ratio 0.7409
Fig 9.
Top values for species.
Show data table
Top values for species (20 unique shown, of 123 total).
valuecountshare
Canella winterana17428.5%
Droseraceae386.2%
Warburgia salutaris355.7%
Cinnamodendron dinisii203.3%
Hylephila phyleus193.1%
Sarraceniaceae193.1%
Cinnamosma fragrans142.3%
Ocybadistes walkeri111.8%
Urbanus dorantes101.6%
Cinnamodendron occhionianum101.6%
Pinellia fujianensis101.6%
Pyrgus oileus91.5%
Cinnamosma madagascariensis91.5%
Mellana eulogius81.3%
Urbanus proteus81.3%
Warburgia ugandensis81.3%
Urbanus procne71.1%
Lerodea eufala71.1%
Burnsius albezens71.1%
Gorgythion begga61.0%

genus categorical feature

Categorical genus name with 94 distinct values across 610 rows and no nulls. The distribution is heavy-tailed: 'Canella' alone accounts for 28.5% (174 records), and the top four values appear to be plant genera (Canella, Warburgia, Cinnamosma, Cinnamodendron) while subsequent entries (Urbanus, Hylephila, Burnsius, Pyrgus) are butterfly/skipper genera, suggesting the column mixes taxa from different kingdoms. Entropy ratio of 0.74 reflects moderate concentration around the dominant genus.

Treatment: Group rare genera into an 'other' bucket and one-hot or target-encode before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["genus"].stats

statvalue
n610
nulls0 (0.0%)
unique94
top_value Canella
top_rate 0.2852
cardinality 94
entropy 4.84
entropy_ratio 0.7384
Fig 10.
Top values for genus.
Show data table
Top values for genus (20 unique shown, of 94 total).
valuecountshare
Canella17428.5%
Warburgia437.0%
Cinnamosma406.6%
Cinnamodendron345.6%
Urbanus254.1%
Hylephila193.1%
Burnsius162.6%
Pyrgus111.8%
Lerema111.8%
Ocybadistes111.8%
Pinellia101.6%
Mellana81.3%
Trapezites81.3%
Heliopetes71.1%
Lerodea71.1%
Toxidia71.1%
Pleodendron71.1%
Staphylus61.0%
Gorgythion61.0%
Polites61.0%

family categorical label

Categorical column holding taxonomic family labels across 610 rows with only 3 distinct values and no nulls. The distribution is essentially bimodal — Hesperiidae and Canellaceae each appear 300 times (top_rate 0.492) while Araceae appears just 10 times — and notably mixes an animal family (Hesperiidae, skipper butterflies) with two plant families, which is an unusual cross-kingdom blend.

Treatment: One-hot encode; consider merging or stratifying given the rare Araceae class (10/610).

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["family"].stats

statvalue
n610
nulls0 (0.0%)
unique3
top_value Hesperiidae
top_rate 0.4918
cardinality 3
entropy 1.104
entropy_ratio 0.6967
Fig 11.
Top values for family.
Show data table
Top values for family (3 unique shown, of 3 total).
valuecountshare
Hesperiidae30049.2%
Canellaceae30049.2%
Araceae101.6%

latitude numeric feature

This column holds geographic latitudes in decimal degrees, ranging from -43.245933 to 46.704735 with a median of 17.008014. The wide IQR of 47.748 and bimodal-leaning kurtosis of -1.28 suggest observations are spread across both hemispheres rather than clustered in one region. With 466 unique values across 610 rows and no nulls or outliers, coverage is clean but globally dispersed.

Treatment: Pair with longitude for geospatial features; avoid treating as a plain scalar in models.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["latitude"].stats

statvalue
n610
nulls0 (0.0%)
unique466
min -43.25
max 46.7
mean 5.2
median 17.01
std 22.75
q1 -22.92
q3 24.83
iqr 47.75
skew -0.6517
kurtosis -1.283
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 12.
Distribution of latitude. Vertical dash marks the median.
Show data table
Histogram bins for latitude (median: 17.008014000000003).
bincount
-43.25 – -39.51
-39.5 – -35.757
-35.75 – -3222
-32 – -28.2538
-28.25 – -24.5165
-24.51 – -20.7632
-20.76 – -17.0112
-17.01 – -13.2612
-13.26 – -9.5146
-9.514 – -5.7662
-5.766 – -2.0191
-2.019 – 1.72918
1.729 – 5.4773
5.477 – 9.22517
9.225 – 12.979
12.97 – 16.7250
16.72 – 20.4798
20.47 – 24.2249
24.22 – 27.97145
27.97 – 31.7120
31.71 – 35.462
35.46 – 39.210
39.21 – 42.960
42.96 – 46.71

longitude numeric feature

Geographic longitude in decimal degrees, spanning -115.04 to 153.39 across 610 rows with no nulls and 467 unique values. The distribution is right-skewed (1.18) with a median of -63.06 sitting well below the mean of -32.94, suggesting a concentration of points in the Western Hemisphere with a long tail reaching into the Eastern Hemisphere. No outliers flagged, consistent with valid lon bounds.

Treatment: Pair with latitude for geospatial features; avoid treating as a standalone scalar in linear models.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["longitude"].stats

statvalue
n610
nulls0 (0.0%)
unique467
min -115
max 153.4
mean -32.94
median -63.06
std 78.93
q1 -89.37
q3 30.84
iqr 120.2
skew 1.184
kurtosis 0.0844
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 13.
Distribution of longitude. Vertical dash marks the median.
Show data table
Histogram bins for longitude (median: -63.056028).
bincount
-115 – -103.961
-103.9 – -92.6791
-92.67 – -81.4916
-81.49 – -70.3184
-70.31 – -59.12128
-59.12 – -47.9444
-47.94 – -36.7512
-36.75 – -25.570
-25.57 – -14.380
-14.38 – -3.1960
-3.196 – 7.9890
7.989 – 19.173
19.17 – 30.3612
30.36 – 41.5438
41.54 – 52.7340
52.73 – 63.910
63.91 – 75.10
75.1 – 86.282
86.28 – 97.470
97.47 – 108.714
108.7 – 119.817
119.8 – 1311
131 – 142.26
142.2 – 153.441

country categorical feature

Country of origin or observation, with 35 distinct values across 610 complete rows. The distribution is moderately concentrated: United States of America leads at 21.3% (130 rows), followed by Mexico (73) and Brazil (51), and the entropy ratio of 0.77 indicates a fairly diverse but US-tilted mix. Notable is the prominence of small territories like Guadeloupe (48) and Puerto Rico (37) ranking above larger nations, suggesting a tropical/Americas sampling bias rather than a global population sample.

Treatment: One-hot encode top values and bucket the long tail into 'Other' before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[31]:

saturn.columns["country"].stats

statvalue
n610
nulls0 (0.0%)
unique35
top_value United States of America
top_rate 0.2131
cardinality 35
entropy 3.961
entropy_ratio 0.7722
Fig 14.
Top values for country.
Show data table
Top values for country (20 unique shown, of 35 total).
valuecountshare
United States of America13021.3%
Mexico7312.0%
Brazil518.4%
Guadeloupe487.9%
Australia477.7%
South Africa416.7%
Madagascar406.6%
Puerto Rico376.1%
Dominican Republic162.6%
Panama152.5%
Argentina142.3%
Singapore101.6%
Cayman Islands101.6%
Antigua and Barbuda101.6%
China101.6%
Virgin Islands (U.S.)81.3%
Kenya81.3%
Hong Kong61.0%
Costa Rica50.8%
Sint Maarten (Dutch part)40.7%

stateProvince categorical feature

Holds state or province names for 610 records spanning 108 distinct values across multiple countries (Texas, Florida, Nayarit, Queensland, KwaZulu-Natal). The mix is uneven: Texas alone covers 13.3% of rows, and the categories blend US states, Mexican states, Brazilian states, and a French city ('Pointe-à-Pitre'), suggesting inconsistent administrative granularity. 30 rows carry an empty-string value that null_rate=0 does not flag, and an explicit 'Other' bucket appears 11 times.

Treatment: Normalise empty strings to null and group rare levels before one-hot or target encoding.

anthropic:claude-opus-4-7 · confidence high
Out[34]:

saturn.columns["stateProvince"].stats

statvalue
n610
nulls0 (0.0%)
unique108
top_value Texas
top_rate 0.1328
cardinality 108
entropy 5.53
entropy_ratio 0.8187
Fig 15.
Top values for stateProvince.
Show data table
Top values for stateProvince (20 unique shown, of 108 total).
valuecountshare
Texas8113.3%
Florida467.5%
Pointe-à-Pitre355.7%
Nayarit335.4%
304.9%
Sinaloa264.3%
Queensland243.9%
KwaZulu-Natal172.8%
Santa Catarina162.6%
Other111.8%
Rio Grande do Sul111.8%
Mpumalanga101.6%
Mahajanga101.6%
Fujian101.6%
Cabo Rojo91.5%
Limpopo91.5%
Toliara91.5%
New South Wales81.3%
Fajardo81.3%
Paraná81.3%

locality categorical free_text

Free-text locality descriptions for specimen records, mostly in French with Malagasy place names (districts, communes, fokontany in Madagascar). 563 of 610 rows (top_rate 0.923) are empty strings, so the field is effectively blank for the vast majority of records, and the remaining 29 unique values are long sentence-length descriptions rather than controlled vocabulary. Entropy ratio of 0.154 confirms the distribution is dominated by the empty value.

Treatment: Treat empty string as missing and parse remaining entries with NER or regex to extract administrative units before use.

anthropic:claude-opus-4-7 · confidence high
Out[37]:

saturn.columns["locality"].stats

statvalue
n610
nulls0 (0.0%)
unique29
top_value
top_rate 0.923
cardinality 29
entropy 0.7475
entropy_ratio 0.1539
alert: long_tail17 singleton categories
Fig 16.
Top values for locality.
Show data table
Top values for locality (20 unique shown, of 29 total).
valuecountshare
56392.3%
District de Soanierana Ivongo, Commune de Manompana, Fokontany de Vohijiny, Village d'Ambohitsara. Forêt littorale de Sahavalanina, au Sud-Est d'Ambohitsara.30.5%
District Mahabo, Commune Analamisandy,Fokontany Soazato, Forêt d'Azohy. Collectés avec: Ando, Tefy, Cécile, Jean Michel. Échantillon préservé en l'alcool.30.5%
Région Vatovavy, Kianjavato, Ambodifandramanana, Ankarabo, vestige de forêt au sud du Mt Vatovavy. Echantillons préservés en alcool, récoltés avec équipe polisinala (Auguste, Jean Frédéric).30.5%
Antsiranana, SAVA, District de Vohémar, Commune rurale d'Antsirabe-nord, Fokontany d'Andravinambo, foret d'Antsolatra Marojala Sokitra. Plantes préservées en alcool, récoltées avec Bezanaka Jean Honoré.30.5%
Région Sofia, District de Mandritsara, commune rurale Marotandrano, fokontany Antsiatsiaka. Foret de Bezavona à 2 km à l'Est du village d'Antsiatsiaka, foret humide sempervirente de moyenne altitude sur latérite. Avec Raharimanana Théo, Ranaivoson Ernest, Marojery Réné chef FKT, Traravola, Rabemalaza Justin, Risy guides locaux.30.5%
Distrit Sakaraha Commune Rurale Amboronabo Fokontany Mitia village Belambo Collecté avec Mamomjy, Tariha, Rehary30.5%
District Sakaraha, Commune rurale Amboronabo, Fokotany Mitia-Est. Forêt de Herea, au Nord d'Analavelona, sur sable. Hameau le plus proche Belambo.30.5%
District Vaingaindrano, Commnune Tsianofana, Fokontany Abaronga, localité Andasibe . Forêt humide de la nouvelle aire protégée d' Agnakatrika. Collecté avec Iakily Armand.30.5%
Serra da Farinha-seca, encosta do Morro Sete.20.3%
Serra da Graciosa. Encosta próxima ao Recanto Bela Vista.20.3%
Estância do Meio.20.3%
UTM25_32T_0600_515010.2%
Parque Estadual da Serra da Baitaca, proximidades da Cachoeira do Samambaia.10.2%
Parque Estadual da Serra da Baitaca,10.2%
Ca. 700 m al sur de San Francisco de San Isidro, costado sur (del parqueo sur) de la escuela Golden Valley. Remanentes de bosque muy húmedo, en cafetales, casas, potreros y finca de Hammel y Pérez por el Río Tures.10.2%
Estação de Tratamento de Água Piraí (ETA Piraí)10.2%
Alto Benedito.10.2%
Comfloresta.10.2%
Sítio Barcelos. Área de PRAD. Propriedade de Vilmar de Lima Barcelos.10.2%

basisOfRecord categorical metadata

Categorical provenance flag from a biodiversity occurrence record (GBIF-style basisOfRecord), with only two values present out of the wider controlled vocabulary. HUMAN_OBSERVATION dominates at 550/610 (90.2%), with PRESERVED_SPECIMEN making up the remaining 60; no nulls. Entropy ratio 0.46 confirms the heavy imbalance.

Treatment: Keep as a binary indicator (e.g., is_specimen) for stratification or filtering.

anthropic:claude-opus-4-7 · confidence high
Out[40]:

saturn.columns["basisOfRecord"].stats

statvalue
n610
nulls0 (0.0%)
unique2
top_value HUMAN_OBSERVATION
top_rate 0.9016
cardinality 2
entropy 0.4638
entropy_ratio 0.4638
Fig 17.
Top values for basisOfRecord.
Show data table
Top values for basisOfRecord (2 unique shown, of 2 total).
valuecountshare
HUMAN_OBSERVATION55090.2%
PRESERVED_SPECIMEN609.8%

year numeric timestamp

Calendar year of the record, spanning only 2021 to 2026 across 610 rows with 6 distinct values. The distribution is left-skewed (skew -0.80) and concentrated at the recent end: median and Q3 both sit at 2026, with Q1 at 2024.

Treatment: Treat as an ordinal time bucket; consider one-hot or year-since-min rather than raw integer.

anthropic:claude-opus-4-7 · confidence high
Out[43]:

saturn.columns["year"].stats

statvalue
n610
nulls0 (0.0%)
unique6
min 2,021
max 2,026
mean 2025
median 2,026
std 1.503
q1 2,024
q3 2,026
iqr 2
skew -0.7969
kurtosis -0.7929
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 18.
Distribution of year. Vertical dash marks the median.
Show data table
Histogram bins for year (median: 2026.0).
bincount
2021 – 20217
2021 – 20210
2021 – 20220
2022 – 20220
2022 – 202270
2022 – 20220
2022 – 20220
2022 – 20230
2023 – 20230
2023 – 202371
2023 – 20230
2023 – 20240
2024 – 20240
2024 – 20240
2024 – 202479
2024 – 20240
2024 – 20250
2025 – 20250
2025 – 20250
2025 – 202575
2025 – 20250
2025 – 20260
2026 – 20260
2026 – 2026308

month numeric feature

Integer values bounded between 1 and 12 with 12 unique levels strongly indicate a calendar month index. The distribution is heavily front-loaded: the median is 1.0 and Q3 is only 7.0, so at least half the rows fall in January and the skew of 1.00 confirms a long tail toward year-end months. Nulls are negligible (0.16%) and no outliers are flagged.

Treatment: Treat as a cyclical categorical (one-hot or sin/cos encode) rather than a raw numeric.

anthropic:claude-opus-4-7 · confidence high
Out[46]:

saturn.columns["month"].stats

statvalue
n610
nulls1 (0.2%)
unique12
min 1
max 12
mean 3.752
median 1
std 3.75
q1 1
q3 7
iqr 6
skew 1.002
kurtosis -0.5078
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 19.
Distribution of month. Vertical dash marks the median.
Show data table
Histogram bins for month (median: 1.0).
bincount
1 – 1.458342
1.458 – 1.9170
1.917 – 2.37518
2.375 – 2.8330
2.833 – 3.29224
3.292 – 3.750
3.75 – 4.20824
4.208 – 4.6670
4.667 – 5.12521
5.125 – 5.5830
5.583 – 6.04217
6.042 – 6.50
6.5 – 6.9580
6.958 – 7.41742
7.417 – 7.8750
7.875 – 8.33323
8.333 – 8.7920
8.792 – 9.2515
9.25 – 9.7080
9.708 – 10.1724
10.17 – 10.620
10.62 – 11.0829
11.08 – 11.540
11.54 – 1230

coordinateUncertainty numeric feature

Numeric coordinate uncertainty values, almost certainly meters of GPS/locality error attached to occurrence records. The distribution is severely right-skewed (skew 17.3, kurtosis 335.7): the median is 35 but the mean is 6463 and the max reaches 766917, with 19.3% of values flagged as outliers. Roughly 22.6% of rows are null, so coverage is partial.

Treatment: Log-transform and impute missing values before using as a quality filter or feature.

anthropic:claude-opus-4-7 · confidence high
Out[49]:

saturn.columns["coordinateUncertainty"].stats

statvalue
n610
nulls138 (22.6%)
unique151
min 1
max 766,917
mean 6463
median 35
std 3.814e+04
q1 5
q3 466.8
iqr 461.8
skew 17.3
kurtosis 335.7
n_outliers 91
outlier_rate 0.1928
zero_rate 0
alert: null_rate22.6% null
alert: high_skewskew=+17.30
alert: outliers19.3% rows beyond 1.5 IQR
Fig 20.
Distribution of coordinateUncertainty. Vertical dash marks the median.
Show data table
Histogram bins for coordinateUncertainty (median: 35.0).
bincount
1 – 3.652e+04467
3.652e+04 – 7.304e+042
7.304e+04 – 1.096e+051
1.096e+05 – 1.461e+050
1.461e+05 – 1.826e+050
1.826e+05 – 2.191e+050
2.191e+05 – 2.556e+051
2.556e+05 – 2.922e+050
2.922e+05 – 3.287e+050
3.287e+05 – 3.652e+050
3.652e+05 – 4.017e+050
4.017e+05 – 4.382e+050
4.382e+05 – 4.748e+050
4.748e+05 – 5.113e+050
5.113e+05 – 5.478e+050
5.478e+05 – 5.843e+050
5.843e+05 – 6.208e+050
6.208e+05 – 6.574e+050
6.574e+05 – 6.939e+050
6.939e+05 – 7.304e+050
7.304e+05 – 7.669e+051

gbifID categorical identifier

This is the GBIF occurrence identifier: every one of the 610 rows carries a unique numeric ID (n_unique=610, top_rate=0.0016, entropy_ratio≈1.0) with no nulls. The top values cluster tightly in the 5937748304–5937748333 range, suggesting the records were ingested in a single contiguous GBIF batch rather than sampled across time.

Treatment: Keep as a primary key for joins back to GBIF; drop from any model features.

anthropic:claude-opus-4-7 · confidence high
Out[52]:

saturn.columns["gbifID"].stats

statvalue
n610
nulls0 (0.0%)
unique610
top_value 5937748304
top_rate 0.001639
cardinality 610
entropy 9.253
entropy_ratio 1
alert: long_tail610 singleton categories
Fig 21.
Top values for gbifID.
Show data table
Top values for gbifID (20 unique shown, of 610 total).
valuecountshare
593774830410.2%
593774830810.2%
593774830910.2%
593774831210.2%
593774831610.2%
593774832210.2%
593774832510.2%
593774832710.2%
593774832910.2%
593774833310.2%
593774833510.2%
593774833610.2%
593774833810.2%
593774834210.2%
593774834410.2%
593774835010.2%
593774835210.2%
593774835310.2%
593774836310.2%
593774836910.2%

How to cite

click to copy

BibTeX
@misc{saturn-quirky-carnivorous-plants-real-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: quirky carnivorous plants real},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/quirky-carnivorous_plants_real}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: quirky carnivorous plants real. Source: /home/coolhand/html/datavis/data_trove/data/quirky/carnivorous_plants_real.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/quirky-carnivorous_plants_real