iXBRL Parsing

iXBRL Parser API for UK Company Accounts

iXBRL is the filing format behind most UK company accounts. Parsing it yourself means handling multiple taxonomy versions, namespace resolution, and ambiguous context references. Registrum does all of that and returns clean JSON.

What is iXBRL?

iXBRL stands for Inline XBRL. It is a document format that embeds XBRL (eXtensible Business Reporting Language) tags directly inside a human-readable HTML document. The same file serves two purposes: it renders as a normal web page when opened in a browser, and it contains machine-readable financial data via XML tags hidden within the markup.

HMRC made iXBRL mandatory for UK corporation tax returns in 2011, which means the majority of annual accounts filed with Companies House since that date contain structured financial data. For any company that files digitally — roughly all private limited companies and LLPs — iXBRL is the format their accountant produces and Companies House stores.

The XBRL taxonomy defines what each tag means. UK companies use one of several taxonomy versions — the UK GAAP taxonomy, FRS 102, FRS 105, or IFRS — and the tag names and structures vary across them. A turnover figure might be tagged as uk-gaap:Turnover, core:Turnover, or bus:TurnoverRevenue depending on the filing version and the software that produced it.

Why parsing it yourself is harder than it looks

The first challenge is downloading the right file. Companies House stores multiple documents per filing — the iXBRL accounts, sometimes a separate directors' report, and index metadata. You need to identify the correct document type from the filing history and follow several API calls to retrieve the actual file.

Once you have the file, the parsing challenges include:

  • Namespace resolution. iXBRL documents declare XML namespaces that map prefixes to taxonomy URIs. These vary between filing software vendors and taxonomy versions — the same concept has different tag names across filings.
  • contextRef matching. Each tagged value references a context element that defines the period (start date, end date, instant) and entity. You must match values to contexts to know whether a number is the current year or prior year.
  • Unit handling. Values are declared with a unitRef attribute pointing to a unit definition. Most are GBP but not all — and employee counts have no monetary unit. You need to resolve and validate units separately.
  • Scale attributes. Numbers in iXBRL can be tagged with a scale attribute (e.g., 3 means thousands). A raw value of 68190000 with scale 3 is actually £68.19 billion. Missing this turns your turnover figures into nonsense.
  • Taxonomy version differences. FRS 105 micro-entity accounts use a different tag set from FRS 102 small company accounts, which differ again from full IFRS filings. A parser that works for one filing type frequently breaks on another.

What Registrum's parser extracts

The Registrum iXBRL parser handles namespace resolution, context matching, unit parsing, and scale normalisation across all major UK taxonomy versions. The output is a consistent JSON structure regardless of which taxonomy version or filing software produced the source document.

Every response includes both the current period and prior year comparatives where available. The data_quality block documents exactly what was found, what was missing, and whether the parser encountered any ambiguity — so you always know the confidence level of the data you are working with.

Fields extracted include: profit_and_loss.turnover, profit_and_loss.gross_profit, profit_and_loss.operating_profit, profit_and_loss.profit_after_tax, balance_sheet.net_assets, balance_sheet.current_assets, balance_sheet.current_liabilities, and other.employees.

Example API call and response

bash
curl -H "X-API-Key: reg_live_..." \
  "https://api.registrum.co.uk/v1/company/00445790/financials"

# Response (abbreviated):
{
  "status": "ok",
  "data": {
    "company_number": "00445790",
    "company_name": "TESCO PLC",
    "period_end": "2024-02-24",
    "currency": "GBP",
    "profit_and_loss": {
      "turnover":         { "value": 68190000000, "prior_year": 65762000000 },
      "profit_after_tax": { "value": 1000000000,  "prior_year": 852000000   }
    },
    "balance_sheet": {
      "net_assets": { "value": 8730000000, "prior_year": 8102000000 }
    },
    "other": {
      "employees": { "value": 295622, "prior_year": 300000 }
    },
    "data_quality": {
      "fields_found": 12,
      "fields_missing": 2,
      "confidence": "high",
      "taxonomy": "uk-gaap-2009",
      "missing_fields": ["gross_profit", "operating_profit"]
    }
  },
  "meta": {
    "cached": true,
    "cache_ttl_seconds": 604800
  }
}

All values are returned in full units (pence are not used). Financial data is cached for 7 days. The data_quality.missing_fields array tells you which concepts were absent in this filing — not parser failures, but genuinely absent data.

Get your free API key

50 free calls per month. No credit card required. No XBRL expertise needed.