saturn·

quirky peppers

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/peppers.json

Saturn profiled 175 rows across 11 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/peppers.json",
    "--findings", "quirky-peppers.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset catalogs 175 pepper varieties with 11 fields covering name, origin, flavor, heat category, biological type, intended use, and Scoville heat measurements (min, median, max, plus a jalapeño-relative score). The Scoville and jalRP numeric columns are extremely right-skewed (skew ~9-10, kurtosis >100) with max scoville_max reaching 16,000,000 versus a median of just 30,000 — a handful of super-hot peppers dominate the tail and 24% of rows flag as outliers. On the categorical side, 'Medium' heat accounts for 40% of peppers and 'Culinary' use covers 80%, while origin leans heavily toward the United States (26%) and Mexico (15%). Worth a closer look first: the Scoville distribution (consider a log scale) and the type column, which has casing inconsistencies ('annuum' vs 'Annuum', 'chinense' vs 'Chinense') that should be cleaned before any grouping.

citing: scoville_max · scoville_median · jalRP · heat · use · origin · type · flavor

Out[4]:

saturn.schema() · 11 columns

column kind n null% unique alerts
name categorical 175 0.0% 175 long_tail
heat categorical 175 0.0% 5
scoville_min numeric 175 0.0% 44 high_skew outliers
scoville_max numeric 175 0.0% 59 high_skew outliers
scoville_median numeric 175 0.0% 80 high_skew outliers
jalRP numeric 175 0.0% 81 high_skew outliers
type categorical 175 0.0% 8
origin categorical 175 0.0% 34
use categorical 175 0.0% 4
flavor categorical 175 0.0% 73 long_tail
url categorical 175 0.0% 175 long_tail
Fig 1.
scoville_max · Heavy right tail — most peppers sit under 100k SHU but a few extreme varieties exceed a million.
Show data table
Histogram bins for scoville_max (median: 30000.0).
bincount
0 – 1.231e+06155
1.231e+06 – 2.462e+0616
2.462e+06 – 3.692e+063
3.692e+06 – 4.923e+060
4.923e+06 – 6.154e+060
6.154e+06 – 7.385e+060
7.385e+06 – 8.615e+060
8.615e+06 – 9.846e+060
9.846e+06 – 1.108e+070
1.108e+07 – 1.231e+070
1.231e+07 – 1.354e+070
1.354e+07 – 1.477e+070
1.477e+07 – 1.6e+071
Fig 2.
heat · Medium dominates at 40%; Extra Hot and Hot are the smallest buckets.
Show data table
Top values for heat (5 unique shown, of 5 total).
valuecountshare
Medium7040.0%
Mild4525.7%
Super Hot3017.1%
Hot179.7%
Extra Hot137.4%
Fig 3.
type · Annuum and chinense cover most peppers, but watch for duplicate casing ('Annuum', 'Chinense') needing cleanup.
Show data table
Top values for type (8 unique shown, of 8 total).
valuecountshare
annuum10459.4%
chinense4626.3%
baccatum126.9%
Annuum42.3%
frutescens42.3%
pubescens21.1%
Chinense21.1%
N/A10.6%
Fig 4.
use · About 80% are culinary, with ornamental as the only meaningful secondary use.
Show data table
Top values for use (4 unique shown, of 4 total).
valuecountshare
Culinary14180.6%
Ornamental3117.7%
Culinary, Ornamental21.1%
10.6%
Fig 5.
origin · United States and Mexico together account for over 40% of varieties; the rest is a long tail of 30+ regions.
Show data table
Top values for origin (20 unique shown, of 34 total).
valuecountshare
United States4626.3%
Mexico2614.9%
South America116.3%
Peru116.3%
Italy84.6%
Unknown74.0%
United Kingdom74.0%
Trinidad74.0%
Caribbean63.4%
India63.4%
Brazil52.9%
Spain42.3%
Hungary42.3%
Japan31.7%
Africa31.7%
China21.1%
Thailand21.1%
Balkan Peninsula10.6%
France10.6%
Chile10.6%
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
namecategorical0.0%
heatcategorical0.0%
scoville_minnumeric0.0%
scoville_maxnumeric0.0%
scoville_mediannumeric0.0%
jalRPnumeric0.0%
typecategorical0.0%
origincategorical0.0%
usecategorical0.0%
flavorcategorical0.0%
urlcategorical0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
scoville_minscoville_maxscoville_medianjalRP
scoville_min+1.00+0.99+1.00+1.00
scoville_max+0.99+1.00+1.00+1.00
scoville_median+1.00+1.00+1.00+1.00
jalRP+1.00+1.00+1.00+1.00

name categorical identifier

The `name` column holds 175 unique strings across 175 rows (cardinality 175, entropy_ratio ~1.0), making it a perfect per-row identifier. Sample values like "Bell Pepper", "Gypsy Pepper", and "Peperone di Senise" suggest this is a catalog of pepper varieties rather than a categorical feature. With every value occurring exactly once (top_rate 0.0057), there is no useful frequency signal to model on.

Treatment: Use as a row label or join key; drop from feature matrix, near-unique.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["name"].stats

statvalue
n175
nulls0 (0.0%)
unique175
top_value Bell Pepper
top_rate 0.005714
cardinality 175
entropy 7.451
entropy_ratio 1
alert: long_tail175 singleton categories
Fig 8.
Top values for name.
Show data table
Top values for name (20 unique shown, of 175 total).
valuecountshare
Bell Pepper10.6%
Gypsy Pepper10.6%
Purple Beauty Pepper10.6%
Melrose Pepper10.6%
Carmen Pepper10.6%
California Wonder Pepper10.6%
Peperone di Senise10.6%
Fushimi Pepper10.6%
Elephant Ears Pepper10.6%
Habanada Pepper10.6%
Tangerine Dream Pepper10.6%
Chilly Chili10.6%
Shishito Pepper10.6%
Trinidad Perfume10.6%
Banana Pepper10.6%
Pepperoncini10.6%
Pimento Pepper10.6%
Jimmy Nardello Pepper10.6%
Mariachi Pepper10.6%
Santa Fe Grande Pepper10.6%

heat categorical feature

This is a categorical heat/spice level rating with 5 ordinal tiers and no nulls across 175 rows. Medium dominates at 40% (70 rows), followed by Mild (45); the upper tiers Hot (17) and Extra Hot (13) are the rarest, while Super Hot (30) is oddly more common than Hot, breaking the expected monotonic decline up the heat scale.

Treatment: Encode as an ordered categorical (Mild < Medium < Hot < Super Hot < Extra Hot) before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["heat"].stats

statvalue
n175
nulls0 (0.0%)
unique5
top_value Medium
top_rate 0.4
cardinality 5
entropy 2.074
entropy_ratio 0.8933
Fig 9.
Top values for heat.
Show data table
Top values for heat (5 unique shown, of 5 total).
valuecountshare
Medium7040.0%
Mild4525.7%
Super Hot3017.1%
Hot179.7%
Extra Hot137.4%

scoville_min numeric feature

Numeric heat ratings (Scoville minimum) for 175 entries spanning 0 to 15,000,000 with a median of 15,000 — classic chili pepper data. Distribution is brutally right-skewed (skew 10.31, kurtosis 120.13) with mean 289,208 dwarfing the median, and 29 outliers (16.6% rate) plus 9.7% zeros. The std of 1,218,458 against an IQR of just 74,000 confirms a long, heavy tail.

Treatment: Apply log1p transform before any modelling to tame the extreme skew and outliers.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["scoville_min"].stats

statvalue
n175
nulls0 (0.0%)
unique44
min 0
max 1.5e+07
mean 2.892e+05
median 15,000
std 1.218e+06
q1 1,000
q3 75,000
iqr 74,000
skew 10.31
kurtosis 120.1
n_outliers 29
outlier_rate 0.1657
zero_rate 0.09714
alert: high_skewskew=+10.31
alert: outliers16.6% rows beyond 1.5 IQR
Fig 10.
Distribution of scoville_min. Vertical dash marks the median.
Show data table
Histogram bins for scoville_min (median: 15000.0).
bincount
0 – 1.154e+06164
1.154e+06 – 2.308e+067
2.308e+06 – 3.462e+063
3.462e+06 – 4.615e+060
4.615e+06 – 5.769e+060
5.769e+06 – 6.923e+060
6.923e+06 – 8.077e+060
8.077e+06 – 9.231e+060
9.231e+06 – 1.038e+070
1.038e+07 – 1.154e+070
1.154e+07 – 1.269e+070
1.269e+07 – 1.385e+070
1.385e+07 – 1.5e+071

scoville_max numeric feature

Maximum Scoville heat ratings for 175 peppers, ranging from 0 to 16,000,000 with a median of 30,000 but a mean of 384,835. Distribution is extremely right-skewed (skew 9.45, kurtosis 106) with 24.6% of values flagged as outliers and 5.7% zeros. The IQR (2,750-100,000) is dwarfed by the max, consistent with a few extreme superhot varieties dominating the tail.

Treatment: log1p-transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["scoville_max"].stats

statvalue
n175
nulls0 (0.0%)
unique59
min 0
max 1.6e+07
mean 3.848e+05
median 30,000
std 1.333e+06
q1 2,750
q3 100,000
iqr 97,250
skew 9.45
kurtosis 106.1
n_outliers 43
outlier_rate 0.2457
zero_rate 0.05714
alert: high_skewskew=+9.45
alert: outliers24.6% rows beyond 1.5 IQR
Fig 11.
Distribution of scoville_max. Vertical dash marks the median.
Show data table
Histogram bins for scoville_max (median: 30000.0).
bincount
0 – 1.231e+06155
1.231e+06 – 2.462e+0616
2.462e+06 – 3.692e+063
3.692e+06 – 4.923e+060
4.923e+06 – 6.154e+060
6.154e+06 – 7.385e+060
7.385e+06 – 8.615e+060
8.615e+06 – 9.846e+060
9.846e+06 – 1.108e+070
1.108e+07 – 1.231e+070
1.231e+07 – 1.354e+070
1.354e+07 – 1.477e+070
1.477e+07 – 1.6e+071

scoville_median numeric feature

Numeric column capturing the median Scoville heat rating across 175 entries with no nulls and 80 unique values. The distribution is extremely right-skewed (skew 9.79, kurtosis 111.5): the median is 22,500 while the mean is 339,805 and the max reaches 15,500,000, with 41 outliers (23.4%) and 5.7% zeros. The IQR (2,000 to 90,000) is tiny relative to the std of 1,278,965, confirming a heavy upper tail.

Treatment: Apply a log1p transform before modelling to compress the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["scoville_median"].stats

statvalue
n175
nulls0 (0.0%)
unique80
min 0
max 1.55e+07
mean 3.398e+05
median 22,500
std 1.279e+06
q1 2,000
q3 90,000
iqr 88,000
skew 9.794
kurtosis 111.5
n_outliers 41
outlier_rate 0.2343
zero_rate 0.05714
alert: high_skewskew=+9.79
alert: outliers23.4% rows beyond 1.5 IQR
Fig 12.
Distribution of scoville_median. Vertical dash marks the median.
Show data table
Histogram bins for scoville_median (median: 22500.0).
bincount
0 – 1.192e+06161
1.192e+06 – 2.385e+0610
2.385e+06 – 3.577e+063
3.577e+06 – 4.769e+060
4.769e+06 – 5.962e+060
5.962e+06 – 7.154e+060
7.154e+06 – 8.346e+060
8.346e+06 – 9.538e+060
9.538e+06 – 1.073e+070
1.073e+07 – 1.192e+070
1.192e+07 – 1.312e+070
1.312e+07 – 1.431e+070
1.431e+07 – 1.55e+071

jalRP numeric feature

Numeric feature 'jalRP' is extremely right-skewed: the median is 4.29 with Q3 at 17.14, yet the max reaches 2952.38 and the mean (64.72) sits far above the median. Skew of 9.79 and kurtosis of 111.48 confirm a heavy tail, and 23.4% of values flag as outliers with 5.7% exact zeros. Only 81 unique values across 175 rows suggests repeated discrete magnitudes rather than a smooth continuum.

Treatment: log1p-transform (or winsorize) before modelling to tame the heavy tail.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["jalRP"].stats

statvalue
n175
nulls0 (0.0%)
unique81
min 0
max 2952
mean 64.72
median 4.29
std 243.6
q1 0.38
q3 17.14
iqr 16.76
skew 9.795
kurtosis 111.5
n_outliers 41
outlier_rate 0.2343
zero_rate 0.05714
alert: high_skewskew=+9.79
alert: outliers23.4% rows beyond 1.5 IQR
Fig 13.
Distribution of jalRP. Vertical dash marks the median.
Show data table
Histogram bins for jalRP (median: 4.29).
bincount
0 – 227.1161
227.1 – 454.210
454.2 – 681.33
681.3 – 908.40
908.4 – 11360
1136 – 13630
1363 – 15900
1590 – 18170
1817 – 20440
2044 – 22710
2271 – 24980
2498 – 27250
2725 – 29521

type categorical label

This column records the Capsicum species (type), dominated by 'annuum' at 59.4% of 175 rows with 'chinense' second at 46. Watch out for case-inconsistent duplicates ('Annuum' 4, 'Chinense' 2 alongside their lowercase forms) and a literal 'N/A' string that isn't being counted as null (null_rate 0.0).

Treatment: Lowercase-normalize and convert 'N/A' to null before using as a categorical label.

anthropic:claude-opus-4-7 · confidence high
Out[31]:

saturn.columns["type"].stats

statvalue
n175
nulls0 (0.0%)
unique8
top_value annuum
top_rate 0.5943
cardinality 8
entropy 1.657
entropy_ratio 0.5524
Fig 14.
Top values for type.
Show data table
Top values for type (8 unique shown, of 8 total).
valuecountshare
annuum10459.4%
chinense4626.3%
baccatum126.9%
Annuum42.3%
frutescens42.3%
pubescens21.1%
Chinense21.1%
N/A10.6%

origin categorical feature

This is a categorical origin/country field with 34 distinct values across 175 rows and no nulls. Distribution is moderately concentrated: United States leads at 26.3% (46 rows), followed by Mexico (26) and South America (11), with entropy ratio 0.78 indicating fairly broad spread across the long tail. Notable quirks include a mix of country-level (United States, Italy, India) and region-level (South America, Caribbean) labels, plus 7 explicit 'Unknown' entries.

Treatment: Normalise country-vs-region granularity and treat 'Unknown' as missing before one-hot or target encoding.

anthropic:claude-opus-4-7 · confidence high
Out[34]:

saturn.columns["origin"].stats

statvalue
n175
nulls0 (0.0%)
unique34
top_value United States
top_rate 0.2629
cardinality 34
entropy 3.98
entropy_ratio 0.7823
Fig 15.
Top values for origin.
Show data table
Top values for origin (20 unique shown, of 34 total).
valuecountshare
United States4626.3%
Mexico2614.9%
South America116.3%
Peru116.3%
Italy84.6%
Unknown74.0%
United Kingdom74.0%
Trinidad74.0%
Caribbean63.4%
India63.4%
Brazil52.9%
Spain42.3%
Hungary42.3%
Japan31.7%
Africa31.7%
China21.1%
Thailand21.1%
Balkan Peninsula10.6%
France10.6%
Chile10.6%

use categorical feature

This is a low-cardinality categorical describing the use of an item, with 4 distinct values across 175 rows and no nulls. The distribution is heavily skewed: 'Culinary' accounts for 80.6% (141 rows), 'Ornamental' for 31, plus 2 rows with a combined 'Culinary, Ornamental' label and 1 empty string that should be treated as missing. Entropy ratio of 0.40 confirms the imbalance.

Treatment: Normalize the empty string to null and split the comma-delimited value into multi-hot flags before encoding.

anthropic:claude-opus-4-7 · confidence high
Out[37]:

saturn.columns["use"].stats

statvalue
n175
nulls0 (0.0%)
unique4
top_value Culinary
top_rate 0.8057
cardinality 4
entropy 0.8097
entropy_ratio 0.4049
Fig 16.
Top values for use.
Show data table
Top values for use (4 unique shown, of 4 total).
valuecountshare
Culinary14180.6%
Ornamental3117.7%
Culinary, Ornamental21.1%
10.6%

flavor categorical feature

This is a categorical flavor descriptor field, with values that look like comma-separated tag combinations (e.g. 'Sweet, Fruity, Earthy, Smoky') rather than single labels. Cardinality is high — 73 unique values across only 175 rows — and entropy_ratio of 0.845 confirms a long tail; the top value 'Sweet' covers just 14.3% of rows. The compound labels suggest the underlying data is multi-label flavor notes that have been collapsed into one string.

Treatment: Split on commas and one-hot encode the individual flavor tags before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[40]:

saturn.columns["flavor"].stats

statvalue
n175
nulls0 (0.0%)
unique73
top_value Sweet
top_rate 0.1429
cardinality 73
entropy 5.232
entropy_ratio 0.8453
alert: long_tail49 singleton categories
Fig 17.
Top values for flavor.
Show data table
Top values for flavor (20 unique shown, of 73 total).
valuecountshare
Sweet2514.3%
Sweet, Fruity2112.0%
Neutral1910.9%
Fruity, Sweet63.4%
Bright, Sweet42.3%
Sweet, Tangy42.3%
Sweet, Fruity, Smoky42.3%
Sweet, Fruity, Citrusy42.3%
Sweet, Fruity, Earthy, Smoky42.3%
Sweet, Fruity, Floral31.7%
Sweet, Fruity, Citrusy, Floral31.7%
Sweet, Fruity, Earthy31.7%
Sweet, Tropical31.7%
Bright, Grassy31.7%
Sweet, Floral21.1%
Sweet, Smoky21.1%
Earthy21.1%
Smoky, Sweet, Earthy21.1%
Smoky, Earthy21.1%
Sweet, Citrusy21.1%

url categorical identifier

This is a URL column serving as a per-row identifier, with all 175 values unique and zero nulls. Every entry is a pepperscale.com pepper page (e.g., bell-pepper, gypsy-pepper, habanada-pepper), so the column is effectively a primary key for pepper varieties. Entropy ratio of ~1.0 confirms no repetition.

Treatment: Drop from modelling; retain as a row key or source link.

anthropic:claude-opus-4-7 · confidence high
Out[43]:

saturn.columns["url"].stats

statvalue
n175
nulls0 (0.0%)
unique175
top_value https://www.pepperscale.com/bell-pepper/
top_rate 0.005714
cardinality 175
entropy 7.451
entropy_ratio 1
alert: long_tail175 singleton categories
Fig 18.
Top values for url.
Show data table
Top values for url (20 unique shown, of 175 total).
valuecountshare
https://www.pepperscale.com/bell-pepper/10.6%
https://www.pepperscale.com/gypsy-pepper/10.6%
https://www.pepperscale.com/purple-beauty-pepper/10.6%
https://www.pepperscale.com/melrose-pepper/10.6%
https://www.pepperscale.com/carmen-pepper/10.6%
https://www.pepperscale.com/california-wonder-pepper/10.6%
https://www.pepperscale.com/peperone-di-senise/10.6%
https://www.pepperscale.com/fushimi-pepper/10.6%
https://www.pepperscale.com/elephant-ears-pepper/10.6%
https://www.pepperscale.com/habanada-pepper/10.6%
https://www.pepperscale.com/tangerine-dream-pepper/10.6%
https://www.pepperscale.com/chilly-chili/10.6%
https://www.pepperscale.com/shishito-pepper/10.6%
https://www.pepperscale.com/trinidad-perfume/10.6%
https://www.pepperscale.com/banana-pepper/10.6%
https://www.pepperscale.com/pepperoncini/10.6%
https://www.pepperscale.com/pimento-pepper/10.6%
https://pepperscale.com/jimmy-nardello-pepper/10.6%
https://www.pepperscale.com/mariachi-pepper/10.6%
https://www.pepperscale.com/santa-fe-grande-pepper/10.6%

How to cite

click to copy

BibTeX
@misc{saturn-quirky-peppers-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: quirky peppers},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/quirky-peppers}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: quirky peppers. Source: /home/coolhand/html/datavis/data_trove/data/quirky/peppers.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/quirky-peppers