Data Classes
Differences between original, normalized and ledgerized data
Data Classes
Bridge's WealthTech API offers three classes of data:
- Original data is information as it's represented by the source without any restructuring and generally very little parsing of fields. The structure of the data is determined by the source provider. The only transformations applied have been to convert the data to a json format, and lightweight parsing of integers, numbers and boolean values.
- Normalized data applies a common structure and classification to the original data, making it simple and easy to process data across multiple custodians.
- Ledgerized data only takes normalized transactions as input and reproduces the full suite of portfolio data (lots, realized and unrealized gain/loss, positions, balances) as well as classifies the transactions (as buy/sell, income/expense, transfers). Our ledgerization system is called Luca.
Original Data
GA in Q1 2023
This is the raw data received from each source provider. This data is represented as json, but otherwise a pure pass-through with little processing. Notably we opt for minimal parsing of fields, instead optimizing to keep original data represented as string-based unicode apart from non-printable characters.
This data is high dimensional and full fidelity. That means the breadth or wideness of the data isn't truncated: if a source provides 200 fields, we'll pass 200 fields through the API. Additionally some sources use certain fields in high dimensional ways. For example, a transaction code might have 50+ enumerable values.
Normalized Data
At this stage the WealthTech API has applied an opinionated data model in order to simplify data and represent it uniformly across several sources. Additionally, high dimension fields such as transaction code, which can have many enumerable values, are collapsed down to a few values, but these critical source values are passed through to the normalized data output. Finally, text data at this stage is parsed to integers and floating point values.
The WealthTech API calculates performance and aggregates balances from this data. However it cannot derive lots, positions or realized gain/loss data. This must be provided by the source, and the data is subject to latent deliveries. Some sources deliver lots (which has cost basis information) in a separate (and delayed) data transmission from the other data types.
The major limitation of this data is that it doesn't convey the full history of lots, including closed lots. Only open lots and outstanding positions are available on any given day. Most applications can tolerate this lack of history around closed lots and it's generally recommended applications consume normalized data first.
Normalized data has the following limitations:
- It's typically not available for manual accounts and positions; it depends on the provider of manually-tracked data
- Closed lots aren't generally available (only point-in-time open lots are available), and therefore the full history of lots aren't normalized
- Corporate actions may or may not be explained. For example a source may list "ABC" has changed its symbol to "XYZ" without a corresponding securities transfer or corporate action line item.
- Realized gain/loss data must be provided, since it cannot be derived. Most data sources provide realized gain/loss data over their API and SOD batch drops.
Ledgerized Data
Ledgerized data is built purely from transactions in a ledger-based portfolio accounting system. In the process of ledgerizing data we may not be able to add a transaction to the ledger, which quarantines that transaction and makes the account frozen at that point in time until we've run an internal reconciliation process to update the data. Reconciliation can resolve the same day or up to a full business week depending on the nature of the problem.
Ledgerized data has the following limitations:
- Most sources do not provide lot selection methdology, so our system assumes FIFO-based lot selection. If you're working with an account with lot-selected closing fills, or anything other than FIFO, that account will be subject to a lot of difficult to resolve reconcilation and be slow to become available on the API.
- Sources typically don't provide a transaction ID, which makes it difficult to link related transactions together. This is required for cancelled transactions. We use an ad-hoc matching algorithm for cancellation, but sometimes cancellations result in reconciliation.
- Bridge's internal security master specification is limited, causing limitations around known corporate action events.
Ledgerized vs. Normalized Data
We generally recommend using normalized data over ledgerized data. Our internal billing and reporting systems are migrating to normalized source data, and we recommend the same for most applications.
The main use cases of ledgerized data include (1) full understanding of lot history, included point-in-time closed lots, and (2) manual accounts and positions, or when working with data sources that don't provide pre-computed realized gain loss (since realized gain/loss can't be calculated without ledgerization).
Updated about 1 year ago