Sign in with Google
finit.dev · La Cartita

La Cartita Categorized News Dataset

A licensed corpus for ML training, research, and journalism analytics.

Thousands of articles from lacartita.com, exported with stable identifiers, publication dates, full body text, slugs, and a clean category column (English / Spanish overlap pre-resolved). Refreshed periodically. One payment of $10,000 unlocks the entire corpus, plus every future refresh, for one legal entity — no recurring fees, no row caps.

License tiers

Enterprise Commercial

$10,000 one-time, no recurring fees

One payment. The whole dataset. Forever.

  • Full corpus — every article, every column, no row cap, no per-record fee
  • All future refreshes included — pull the latest CSV whenever you want
  • Perpetual, non-exclusive, worldwide rights for one legal entity
  • Train, fine-tune, evaluate, or benchmark models
  • Ship products that incorporate derived insights or weights
  • Attribution required (see ATTRIBUTION.md)

Not included: redistribution rights, exclusivity, multi-entity coverage, or custom extractions — email lezama@finit.dev for those.

Individuals & Non-profits Free

$0 by request
  • Personal, academic, and journalistic use
  • Registered non-profits doing non-commercial work
  • Same attribution requirements as the Enterprise tier

Email lezama@finit.dev with:

  • a brief description of your intended use
  • who will hold the data
  • an estimated duration

Approval is normally granted within five business days.

Attribution

All licensees credit La Cartita / finit.dev. Software repositories carry a README block; written publications carry the citation. Templates live in ATTRIBUTION.md.

What's in the file

A single UTF-8 CSV: id, date, title, content, excerpt, slug, categories. Re-run the exporter (dataset_export.py) to refresh it; the same filename is overwritten.

Custom terms

Need multi-entity coverage, redistribution rights, exclusivity, or a different price/scope? Email lezama@finit.dev.