Runs, Pipeline, and Health
Queueing, fetch/render monitoring, live run visibility, replay/rematerialization, and system health live in one operational loop.
Warriors stun Lakers in overtime thriller to take 3-2 series lead
NBA • 2 hours agoLionel Messi scores twice in MLS Cup rematch to lift Inter Miami
Soccer • 3 hours agoPanthers clinch playoff berth with overtime winner against Lightning
NHL • 5 hours agoRory McIlroy fires record-tying 63 at Augusta to lead Masters by two strokes
Golf • 6 hours agoJon Jones announces comeback fight for October pay-per-view card
MMA • 8 hours agoOne route map for collection, scrape operations, normalization, pipeline replay, health, versions, and vocabulary ownership. Legacy extract/trainer/errors/admin deep links now open the in-console Normalization workspace.
One deterministic control surface for crawl operations, semantic mapping, preview, versioning, replay, and audit-safe change management.
Queueing, fetch/render monitoring, live run visibility, replay/rematerialization, and system health live in one operational loop.
Pattern detection, candidate review, schema rules, preview, validate, publish, rollback, and provenance stay deterministic and inspectable.
Immutable versions, signed audit verification, trace-aware workflows, and remediation controls keep the lane accountable.
Use this rail when onboarding or repairing a site: sample deterministic snapshots first, inspect pattern reasons, verify candidate evidence, map to Tag Studio-owned vocabulary, preview, review normalized profile output, publish, replay, and monitor.
Mapping decisions should link back to Tag Studio instead of becoming hidden scraper-only vocabulary.
Canonical runtime/operator surface for scrape launches, live monitoring, per-site health, diagnostics, and queue/replay handoff. Snapshot browsing now hands off to Snapshot Center.
No queue URLs yet. Add one or import them below.
Paste/import is only a convenience bridge into the structured queue URL rows above.
Structured queue URLs are the primary owner path here. Paste/import remains available below as a secondary bridge.
No dead-letter payload fields yet. Add one or import JSON below.
Paste/import remains a convenience bridge into the structured dead-letter payload rows above.
Structured dead-letter payload rows are the primary owner path here. Paste/import remains available below as a secondary bridge.
Lease and batch-process controls stay site-scoped so worker follow-through remains RBAC-safe.
Use the toolbar Trace ID to correlate run, queue, replay, audit, and health activity across the control center.
Pattern discovery and candidate evidence now live under Normalization. Replay, audit, and remediation controls now live under Pipeline.
No requested sites yet. Add one or import them below.
Paste/import is only a convenience bridge into the structured site rows above.
Structured site rows are the primary owner path here. Paste/import remains available below as a secondary bridge.
No payload rows yet. Add one or import JSON below.
Paste/import is only a convenience bridge into the structured payload rows above.
Structured payload rows are the primary owner path here. Paste/import remains available below as a secondary bridge.
Inline payloads only. No server-side file paths or uploads. Preview shows deterministic site grouping and skipped payload counts; queue mode hands resolved URLs into the deterministic crawl queue; ingest mode writes snapshots and listings directly.
Waiting for scraper log output...
| Site | Runs | Avg Results | Error Rate | Last Run | Status | Detail |
|---|---|---|---|---|---|---|
| Load runtime monitoring to populate site health. | ||||||
| Started | Duration | Sites | Mode | Results | Status | Detail |
|---|---|---|---|---|---|---|
| Load runtime monitoring to populate recent runs. | ||||||
If no site telemetry is available yet, this panel explains whether the gap comes from no completed run, runtime restart, or missing site attribution in stored history.
Load runs to populate recent activity, status mix, and trace-aware summary details. Raw JSON remains available underneath for diagnostics.
Site records, rollout settings, rate-limit defaults, and secret references for scraper workers.
Load a site from the summary table or enter a Site ID to edit an existing record.
Upsert credential refs here, then confirm them through canonical readback.
Load refs for a site to manage row-level secret refs here.
Load sites or secret refs to inspect field-driven inventory summaries here. Raw JSON remains available underneath for diagnostics.
Load sites or policy payloads to inspect Compliance / Crawl Policy controls here. Challenge detection, access resilience, and network routing remain human-approved policy workflows.
Deterministic pattern discovery, sample-page browsing, and candidate-evidence drill-in for site-specific extraction workflows.
No discovery URLs yet. Add one or import them below.
Paste/import is only a convenience bridge into the structured discovery URL rows above.
Structured discovery URLs are the primary owner path here. Paste/import remains available below as a secondary bridge.
No candidate page IDs yet. Add one or import them below.
Paste/import is only a convenience bridge into the structured page-ID rows above. Candidate loads use the first valid structured page ID.
Structured page IDs are the primary owner path here. Candidate loads use the first valid row.
Pattern discovery output should expose suggested pattern drafts plus the reasons an operator needs to trust or reject them: structured-data hints, repeated-region signatures, content-density signals, route/page-class clues, and forbidden selector candidates.
Load patterns or candidates to inspect field-driven extract summaries here. Raw JSON remains available underneath for diagnostics.
Inspect extracted fields with page-level evidence before you bind rules or publish a deterministic mapping version. This uses the first valid structured page-ID row from the Sample page browser card.
Before binding, confirm each candidate has inspectable evidence and a clear owner path.
Bulk/manual rule authoring, transforms, and deterministic binding thresholds in one mapping workbench.
Add structured bindings here. Leave the list empty only when you intentionally want a threshold-only publish.
Structured bindings are the primary owner path here. Paste/import remains available below as a secondary bridge.
Paste/import is only a convenience bridge into the structured binding list above. Publishing uses the structured rows, not the raw textarea.
Add transform rows here. Publishing uses the structured list first and keeps comma import as a secondary bridge.
Structured transform rows are the primary owner path here. Paste/import remains available below as a secondary bridge.
Paste/import is only a convenience bridge into the structured transform rows above. Publishing uses the row list, not the raw input.
Publish bulk or manual rules to inspect field-driven mapping summary details here. Raw JSON remains available underneath for diagnostics.
Preview mapped output, validate deterministic rules, and keep live fetch explicitly opt-in after snapshot-first review.
No sample page IDs yet. Add one or import them below.
Paste/import is only a convenience bridge into the structured page-id rows above.
No sample URLs yet. Add one or import them below.
Paste/import is only a convenience bridge into the structured URL rows above.
Structured preview sources are the primary owner path here. Paste/import remains available below as a secondary bridge.
Queueing now lives under Runs so preview remains a semantic verification surface instead of a second crawl launcher.
Publish only after snapshot-first preview has a usable fill rate, validation warnings are explained or resolved, dead letters are triaged, and replay freshness shows the new mapping can rematerialize recent evidence.
Run preview or validation to inspect field-driven preview counts, trace/run identifiers, and validation totals here. Raw JSON remains available underneath for diagnostics.
Run preview to load field-level evidence when the adapter payload provides it. If no evidence ledger appears, the adapter has not supplied field provenance yet.
Run preview or validation to inspect field-level confidence dimensions here. Raw JSON remains available underneath for diagnostics.
Work the canonical normalized profile lane here: inventory recent profiles, materialize from a listing, inspect effective values, publish structured overrides, and rollback by version without mutating raw scrape facts.
This inventory stays on the scraper-mapping owner path. Listings remains a projection surface and does not write normalized profile state directly.
Leave field keys empty to materialize all currently supported fields.
Materialization seeds or refreshes the normalized profile from the listing's canonical data without changing the raw listing payload. Paste/import is only a convenience bridge into the structured field-key list above.
Rollback is append-only. The selected target version becomes the next active override version instead of rewriting history.
Load or materialize a profile to inspect effective values, override state, source listing context, and audit-safe metadata.
Load a profile to edit override rows.
Structured scalar rows are the primary owner path here. Load a profile to begin.
Structured scalar rows are the primary owner path here. Complex nested values remain available through the advanced JSON editor below and are preserved unless you replace the same key in the structured editor.
Overrides are the user-authored layer. They never rewrite raw snapshots, source rows, or raw scrape facts.
Load profiles to see recent normalized records for the selected site/pattern scope.
Load a profile to inspect append-only override versions, active state, authored-by metadata, and rollback targets.
Load a profile to compare effective values with their current source layer and override ownership metadata.
Immutable publish history, compare summaries, and rollback controls. Replay has its own dedicated tab.
Load or compare versions to inspect field-driven publish history and replay-safe diff context here. Raw JSON remains available underneath for diagnostics.
Replay, queue handoff, replay jobs, and downstream remediation stay correlated through the toolbar Trace ID.
No replay listing IDs yet. Add one or import them below.
Paste/import is only a convenience bridge into the structured replay listing-ID rows above.
No replay page IDs yet. Add one or import them below.
Paste/import is only a convenience bridge into the structured replay page-ID rows above.
Structured replay scope IDs are the primary owner path here. Paste/import remains available below as a secondary bridge.
Run launches, live log, snapshots, diagnostics, and site-health telemetry stay under Runs. Use Pipeline for replay/rematerialization, queue handoff, and downstream job visibility. Pre-scan policy is dynamic and follows Scan (Exclude - Always Filters) from the current runtime/Tag Studio filter contract.
Samples -> Patterns -> Candidates -> Map -> Preview -> Normalize -> Publish -> Replay -> Monitor. Use this order when onboarding or repairing a site, then confirm replay jobs and health telemetry before closing.
Pipeline operators often need quick access to audit, review, and dead-letter queues while replaying recent work.
Signed audit history remains canonical evidence, but it now sits inside Pipeline so replay/rematerialization and remediation stay together.
Load the review queue to resolve items from live row actions instead of copying IDs by hand.
Load dead letters to resolve live rows from this card instead of copying IDs by hand.
Resolve review and dead-letter items here, then confirm their canonical readback state before treating them as final.
Load replay, audit, review, or dead-letter payloads to inspect field-driven pipeline summaries here. Raw JSON remains available underneath for diagnostics.
Run replay or load replay jobs to inspect field-driven job summaries here. Raw JSON remains available underneath for diagnostics.
Run replay or load replay jobs to inspect trace artifacts here. Raw JSON remains available underneath for diagnostics.
Site health, provenance lookup, validation/dedup truth, trace correlation, and listings-level runtime visibility.
Waiting for runtime supervisor state…
Maintenance: none
Waiting for watcher data…
Monitoring filters use the toolbar Trace ID. Provenance lookup now returns provenance, validation, dedup, and latest materialization audit context for one listing.
Load site health, listings, provenance, or trace data to inspect field-driven monitoring summaries here. Raw JSON remains available underneath for diagnostics.
Load provenance or trace correlation to inspect data lineage here. Raw JSON remains available underneath for diagnostics.
Load health or SLO metrics to inspect reliability target and error-budget burn here. Raw JSON remains available underneath for diagnostics.
Load health or trace data to inspect payload-provided drift and anomaly signals here. Raw JSON remains available underneath for diagnostics.