saturn·

api auth

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/data/api_auth.db

Saturn profiled 502 rows across 11 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/data/api_auth.db",
    "--findings", "api_auth.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 502 API request logs across 11 columns, capturing usage telemetry like response time, status code, endpoint, and user agent. Traffic is dominated by a single API ('linguistic-api' at 99.6%) and a single method (GET), with all requests coming from one IP (127.0.0.1), so the interesting variation lives in endpoint, response_time_ms, status_code, and user_agent. Response times are heavily skewed: the median is just 3ms but the mean is 163ms with a max of 1238ms and 78 outliers (~24%), plus a 34% null rate worth investigating. Status codes split between 200 and 429, hinting at rate-limiting behavior. The endpoint column has a long tail of 209 distinct paths, with /api/languages and /api/search leading.

citing: response_time_ms · api_name · endpoint · status_code · user_agent · method · ip_address · timestamp

Out[4]:

saturn.schema() · 11 columns

column kind n null% unique alerts
usage_id numeric 502 0.0% 502
key_id unknown 502 0.0% skipped
api_name categorical 502 0.0% 2 imbalance
endpoint categorical 502 0.0% 209 long_tail
method categorical 502 0.0% 1 imbalance
status_code numeric 502 0.0% 2
response_time_ms numeric 502 34.5% 83 null_rate outliers
cache_hit numeric 502 0.0% 1 constant
ip_address categorical 502 0.0% 1 imbalance
user_agent categorical 502 0.0% 12
timestamp categorical 502 0.0% 299 long_tail
Fig 1.
response_time_ms · Look for the heavy right tail and outliers — most responses are tiny but a few exceed 1000ms.
Show data table
Histogram bins for response_time_ms (median: 3.0).
bincount
2 – 70.67253
70.67 – 139.320
139.3 – 2085
208 – 276.70
276.7 – 345.30
345.3 – 4140
414 – 482.70
482.7 – 551.30
551.3 – 6200
620 – 688.71
688.7 – 757.30
757.3 – 8260
826 – 894.70
894.7 – 963.336
963.3 – 10329
1032 – 11012
1101 – 11691
1169 – 12382
Fig 2.
status_code · Compare the share of 200 vs 429 responses to gauge how often the API is rate-limiting clients.
Show data table
Histogram bins for status_code (median: 200.0).
bincount
200 – 210.4329
210.4 – 220.80
220.8 – 231.20
231.2 – 241.60
241.6 – 2520
252 – 262.50
262.5 – 272.90
272.9 – 283.30
283.3 – 293.70
293.7 – 304.10
304.1 – 314.50
314.5 – 324.90
324.9 – 335.30
335.3 – 345.70
345.7 – 356.10
356.1 – 366.50
366.5 – 3770
377 – 387.40
387.4 – 397.80
397.8 – 408.20
408.2 – 418.60
418.6 – 429173
Fig 3.
endpoint · See which endpoints dominate traffic; /api/languages and /api/search lead a long tail of 209 paths.
Show data table
Top values for endpoint (20 unique shown, of 209 total).
valuecountshare
/api/languages5811.6%
/api/search367.2%
/api/stats244.8%
/api/languages/macroareas183.6%
/api/languages/by_macroarea/Africa163.2%
/api/languages/eng132.6%
/api/languages/spa/phonemes122.4%
/api/families/indo1319122.4%
/api/typology/wals/parameters122.4%
/api/languages/NOPE112.2%
/api/languages/eng/features112.2%
/api/languages/cmn102.0%
/api/languages/eng/phonemes102.0%
/api/languages/NOPE/phonemes102.0%
/api/languages/NOPE/features102.0%
/api/families/NOPE102.0%
/api/typology/wals/map/81A81.6%
/api/typology/phonology/summary71.4%
/api/typology/phonology/inventory/eng71.4%
/api/typology/types/by_family61.2%
Fig 4.
user_agent · Check the client mix — Werkzeug and a GoogleOther mobile crawler together account for most traffic.
Show data table
Top values for user_agent (12 unique shown, of 12 total).
valuecountshare
Werkzeug/3.1.421542.8%
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.7680.177 Mobile Safari/537.36 (compatible; GoogleOther)19639.0%
Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; https://zhanzhang.toutiao.com/)265.2%
curl/7.81.0244.8%
Mozilla/5.0 (iPhone; CPU iPhone OS 26_5_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/147.0.7727.99 Mobile/15E148 Safari/604.1142.8%
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)142.8%
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.7680.177 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)71.4%
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/136.0.0.0 Safari/537.3620.4%
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:146.0) Gecko/20100101 Firefox/146.010.2%
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot)10.2%
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.7727.116 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)10.2%
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.7727.137 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)10.2%
Fig 5.
api_name · Confirm the extreme imbalance: nearly all requests hit linguistic-api, with only 2 going to blissAPI.
Show data table
Top values for api_name (2 unique shown, of 2 total).
valuecountshare
linguistic-api50099.6%
blissAPI20.4%
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
usage_idnumeric0.0%
key_idunknown0.0%
api_namecategorical0.0%
endpointcategorical0.0%
methodcategorical0.0%
status_codenumeric0.0%
response_time_msnumeric34.5%
cache_hitnumeric0.0%
ip_addresscategorical0.0%
user_agentcategorical0.0%
timestampcategorical0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
usage_idstatus_coderesponse_time_mscache_hit
usage_id+1.00-0.50-0.26+nan
status_code-0.50+1.00+0.01+nan
response_time_ms-0.26+0.01+1.00+nan
cache_hit+nan+nan+nan+nan

usage_id numeric identifier

A monotonic surrogate key: 502 unique values across 502 rows with no nulls, ranging from 1 to 502 and a perfectly symmetric mean and median of 251.5. Skew is 0.0 and there are no outliers, consistent with a sequential row identifier rather than a measured quantity.

Treatment: Drop from modelling; retain as a join key.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["usage_id"].stats

statvalue
n502
nulls0 (0.0%)
unique502
min 1
max 502
mean 251.5
median 251.5
std 145.1
q1 126.2
q3 376.8
iqr 250.5
skew 0
kurtosis -1.2
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 8.
Distribution of usage_id. Vertical dash marks the median.
Show data table
Histogram bins for usage_id (median: 251.5).
bincount
1 – 23.7723
23.77 – 46.5523
46.55 – 69.3223
69.32 – 92.0923
92.09 – 114.922
114.9 – 137.623
137.6 – 160.423
160.4 – 183.223
183.2 – 20622
206 – 228.723
228.7 – 251.523
251.5 – 274.323
274.3 – 29723
297 – 319.822
319.8 – 342.623
342.6 – 365.423
365.4 – 388.123
388.1 – 410.922
410.9 – 433.723
433.7 – 456.523
456.5 – 479.223
479.2 – 50223

key_id unknown identifier

The column 'key_id' was skipped by the profiler, so no type, uniqueness, or distributional statistics are available beyond a row count of 502 and a null rate of 0.0. The name suggests an identifier, but without n_unique or sample values this cannot be confirmed from the evidence.

Treatment: Re-profile with type inference enabled before deciding; if confirmed unique, use as a join key and exclude from modelling.

anthropic:claude-opus-4-7 · confidence low
Out[16]:

saturn.columns["key_id"].stats

statvalue
n502
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

api_name categorical metadata

This is a categorical API identifier with only 2 distinct values across 502 rows. It is overwhelmingly dominated by 'linguistic-api' (500 rows, 99.6%), with 'blissAPI' appearing just twice, yielding near-zero entropy (0.037). The column is effectively a constant with two anomalous records.

Treatment: Drop as a near-constant feature, or isolate the 2 'blissAPI' rows for inspection.

anthropic:claude-opus-4-7 · confidence high
Out[18]:

saturn.columns["api_name"].stats

statvalue
n502
nulls0 (0.0%)
unique2
top_value linguistic-api
top_rate 0.996
cardinality 2
entropy 0.0375
entropy_ratio 0.0375
alert: imbalancetop value is 99.6% of rows
Fig 9.
Top values for api_name.
Show data table
Top values for api_name (2 unique shown, of 2 total).
valuecountshare
linguistic-api50099.6%
blissAPI20.4%

endpoint categorical feature

This column records API endpoint paths, with 209 unique routes across 502 requests and no nulls. Traffic is spread fairly evenly (entropy ratio 0.828), though /api/languages leads at 11.6% of hits, followed by /api/search (36) and /api/stats (24); a long tail of rarely-hit routes triggers the alert. Notably /api/languages/NOPE appears 11 times, suggesting either a probing client or a broken reference worth investigating.

Treatment: Group rare endpoints into an 'other' bucket before encoding, and inspect the /NOPE hits separately.

anthropic:claude-opus-4-7 · confidence high
Out[21]:

saturn.columns["endpoint"].stats

statvalue
n502
nulls0 (0.0%)
unique209
top_value /api/languages
top_rate 0.1155
cardinality 209
entropy 6.383
entropy_ratio 0.8282
alert: long_tail180 singleton categories
Fig 10.
Top values for endpoint.
Show data table
Top values for endpoint (20 unique shown, of 209 total).
valuecountshare
/api/languages5811.6%
/api/search367.2%
/api/stats244.8%
/api/languages/macroareas183.6%
/api/languages/by_macroarea/Africa163.2%
/api/languages/eng132.6%
/api/languages/spa/phonemes122.4%
/api/families/indo1319122.4%
/api/typology/wals/parameters122.4%
/api/languages/NOPE112.2%
/api/languages/eng/features112.2%
/api/languages/cmn102.0%
/api/languages/eng/phonemes102.0%
/api/languages/NOPE/phonemes102.0%
/api/languages/NOPE/features102.0%
/api/families/NOPE102.0%
/api/typology/wals/map/81A81.6%
/api/typology/phonology/summary71.4%
/api/typology/phonology/inventory/eng71.4%
/api/typology/types/by_family61.2%

method categorical metadata

This column records the HTTP method, but every one of the 502 rows is "GET" — cardinality is 1 and entropy is 0. It carries no information for any downstream model or segmentation.

Treatment: Drop; constant column with a single value.

anthropic:claude-opus-4-7 · confidence high
Out[24]:

saturn.columns["method"].stats

statvalue
n502
nulls0 (0.0%)
unique1
top_value GET
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 11.
Top values for method.
Show data table
Top values for method (1 unique shown, of 1 total).
valuecountshare
GET502100.0%

status_code numeric feature

This column holds HTTP status codes, taking only 2 distinct values across 502 rows: 200 (success) and 429 (rate-limited), with 200 as the median and 429 as Q3. The mean of 278.9 implies roughly a third of requests were throttled, which is a notable failure rate worth investigating. No nulls or outliers, and the bimodal shape is reflected in the negative kurtosis (-1.57).

Treatment: Recode as a binary success/throttled flag before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[27]:

saturn.columns["status_code"].stats

statvalue
n502
nulls0 (0.0%)
unique2
min 200
max 429
mean 278.9
median 200
std 108.9
q1 200
q3 429
iqr 229
skew 0.6539
kurtosis -1.572
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 12.
Distribution of status_code. Vertical dash marks the median.
Show data table
Histogram bins for status_code (median: 200.0).
bincount
200 – 210.4329
210.4 – 220.80
220.8 – 231.20
231.2 – 241.60
241.6 – 2520
252 – 262.50
262.5 – 272.90
272.9 – 283.30
283.3 – 293.70
293.7 – 304.10
304.1 – 314.50
314.5 – 324.90
324.9 – 335.30
335.3 – 345.70
345.7 – 356.10
356.1 – 366.50
366.5 – 3770
377 – 387.40
387.4 – 397.80
397.8 – 408.20
408.2 – 418.60
418.6 – 429173

response_time_ms numeric feature

This column captures response times in milliseconds for 502 records. The distribution is severely right-skewed (skew 1.90, kurtosis 1.74): the median is just 3 ms and Q3 is 21 ms, yet the mean is 162.84 ms and the max reaches 1238 ms, with std at 345.34. Two analyst-relevant flags: 34.46% of rows are null, and 23.71% (78 values) fall outside the IQR fence.

Treatment: Log-transform and impute the 34% nulls before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[30]:

saturn.columns["response_time_ms"].stats

statvalue
n502
nulls173 (34.5%)
unique83
min 2
max 1,238
mean 162.8
median 3
std 345.3
q1 3
q3 21
iqr 18
skew 1.903
kurtosis 1.744
n_outliers 78
outlier_rate 0.2371
zero_rate 0
alert: null_rate34.5% null
alert: outliers23.7% rows beyond 1.5 IQR
Fig 13.
Distribution of response_time_ms. Vertical dash marks the median.
Show data table
Histogram bins for response_time_ms (median: 3.0).
bincount
2 – 70.67253
70.67 – 139.320
139.3 – 2085
208 – 276.70
276.7 – 345.30
345.3 – 4140
414 – 482.70
482.7 – 551.30
551.3 – 6200
620 – 688.71
688.7 – 757.30
757.3 – 8260
826 – 894.70
894.7 – 963.336
963.3 – 10329
1032 – 11012
1101 – 11691
1169 – 12382

cache_hit numeric feature

This is a numeric flag named cache_hit, presumably a 0/1 indicator of whether a cache lookup succeeded. Across all 502 rows it is constant at 0 (zero_rate 1.0, n_unique 1, std 0.0), meaning no cache hit was ever recorded. That is either a broken instrumentation path or a workload where caching is disabled.

Treatment: Drop; provides no signal while constant.

anthropic:claude-opus-4-7 · confidence high
Out[33]:

saturn.columns["cache_hit"].stats

statvalue
n502
nulls0 (0.0%)
unique1
min 0
max 0
mean 0
median 0
std 0
q1 0
q3 0
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 1
alert: constantonly one distinct value
Fig 14.
Distribution of cache_hit. Vertical dash marks the median.
Show data table
Histogram bins for cache_hit (median: 0.0).
bincount
-0.5 – -0.45450
-0.4545 – -0.40910
-0.4091 – -0.36360
-0.3636 – -0.31820
-0.3182 – -0.27270
-0.2727 – -0.22730
-0.2273 – -0.18180
-0.1818 – -0.13640
-0.1364 – -0.090910
-0.09091 – -0.045450
-0.04545 – 00
0 – 0.04545502
0.04545 – 0.090910
0.09091 – 0.13640
0.1364 – 0.18180
0.1818 – 0.22730
0.2273 – 0.27270
0.2727 – 0.31820
0.3182 – 0.36360
0.3636 – 0.40910
0.4091 – 0.45450
0.4545 – 0.50

ip_address categorical metadata

This column records an IP address but holds the loopback value 127.0.0.1 for all 502 rows, with zero nulls and cardinality of 1. Entropy is 0.0, so the field carries no information and looks like a placeholder or a logging artefact rather than a real client IP.

Treatment: Drop, constant column with no signal.

anthropic:claude-opus-4-7 · confidence high
Out[36]:

saturn.columns["ip_address"].stats

statvalue
n502
nulls0 (0.0%)
unique1
top_value 127.0.0.1
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 15.
Top values for ip_address.
Show data table
Top values for ip_address (1 unique shown, of 1 total).
valuecountshare
127.0.0.1502100.0%

user_agent categorical metadata

HTTP User-Agent strings from request logs, with only 12 distinct values across 502 rows and no nulls. Traffic is dominated by non-browser clients: Werkzeug/3.1.4 (the Flask dev server's default UA) at 215 hits and GoogleOther at 196, together covering ~82% of requests, with curl and assorted bots (Bytespider, Applebot, Googlebot, bingbot, GPTBot) making up most of the rest. Genuine human browser traffic appears negligible — only a single Firefox hit and a handful of mobile Safari/Chrome entries.

Treatment: Parse into client-family/bot-flag features rather than using the raw string.

anthropic:claude-opus-4-7 · confidence high
Out[39]:

saturn.columns["user_agent"].stats

statvalue
n502
nulls0 (0.0%)
unique12
top_value Werkzeug/3.1.4
top_rate 0.4283
cardinality 12
entropy 1.962
entropy_ratio 0.5473
Fig 16.
Top values for user_agent.
Show data table
Top values for user_agent (12 unique shown, of 12 total).
valuecountshare
Werkzeug/3.1.421542.8%
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.7680.177 Mobile Safari/537.36 (compatible; GoogleOther)19639.0%
Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; https://zhanzhang.toutiao.com/)265.2%
curl/7.81.0244.8%
Mozilla/5.0 (iPhone; CPU iPhone OS 26_5_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/147.0.7727.99 Mobile/15E148 Safari/604.1142.8%
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)142.8%
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.7680.177 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)71.4%
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/136.0.0.0 Safari/537.3620.4%
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:146.0) Gecko/20100101 Firefox/146.010.2%
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot)10.2%
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.7727.116 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)10.2%
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.7727.137 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)10.2%

timestamp categorical timestamp

This is a timestamp column stored as strings, with 299 unique values across 502 rows and no nulls. The distribution is unusually clumpy for a timestamp: six values on 2026-04-17 each repeat 14-38 times, accounting for roughly 189 rows, while most other timestamps appear only a handful of times. That burst pattern suggests batched events or a logging artifact rather than free-flowing event time.

Treatment: Parse to datetime and derive features (hour, day, gap-to-previous); investigate the 2026-04-17 burst before using as an index.

anthropic:claude-opus-4-7 · confidence high
Out[42]:

saturn.columns["timestamp"].stats

statvalue
n502
nulls0 (0.0%)
unique299
top_value 2026-04-17 10:06:57
top_rate 0.0757
cardinality 299
entropy 6.98
entropy_ratio 0.8488
alert: long_tail281 singleton categories
Fig 17.
Top values for timestamp.
Show data table
Top values for timestamp (20 unique shown, of 299 total).
valuecountshare
2026-04-17 10:06:57387.6%
2026-04-17 10:07:57387.6%
2026-04-17 10:32:51377.4%
2026-04-17 10:07:09316.2%
2026-04-17 10:26:55316.2%
2026-04-17 10:06:29142.8%
2026-04-23 00:15:0440.8%
2026-04-24 01:08:1040.8%
2026-04-24 22:11:1940.8%
2026-01-06 05:06:5830.6%
2026-04-19 04:06:2330.6%
2026-01-06 05:07:5320.4%
2026-04-17 10:26:5320.4%
2026-04-17 10:27:4420.4%
2026-04-19 04:33:0020.4%
2026-04-19 07:17:0920.4%
2026-04-29 21:16:5920.4%
2026-04-30 03:37:4520.4%
2026-01-06 05:39:5710.2%
2026-01-06 09:44:5810.2%

How to cite

click to copy

BibTeX
@misc{saturn-api-auth-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: api auth},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/api_auth}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: api auth. Source: /home/coolhand/data/api_auth.db. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/api_auth