The modern economy runs on data but no one knows what data is worth. In 2014 global GDP was $78 trillion and information communications technology (ICT) spending was $4 trillion—some portion of that $74 trillion is attributable to data. But how much?
According to a recent Capgemini (2014) report, global organizational spending on big data exceeded $31 billion in 2013 and is expected to reach $114 billion by 2018. Based on the Capgemini survey, the three key challenges for big data implementation are: (1) scattered data lying in silos across various teams (46% of respondents); (2) an absence of clear business cases for funding and implementation (39%); and (3) ineffective coordination of big data and analytics teams across the organization (35%). None of these challenges can be adequately addressed without an objective measure of corporate data as an asset.
Corporate holdings of data and other intangible assets (e.g., patents, trademarks, and copyrights) have been estimated to be worth more than $8 trillion, roughly equivalent to the combined GDP of Germany, France, and Italy.1 However, generally accepted accounting principles (GAAP) prohibit companies from treating data as an asset or counting funds spent on collecting/analyzing data as an investment rather than an expense. This important convention reduces the ability to amortize the costs associated with big data, despite the fact that data exhibit many features of a capital good.2For example, the value of a large customer database is obvious to anyone who has shopped online, but the information content for any given consumer loses value over time if unrefreshed because the consumer may drop out of the database or change his/her tastes and purchasing patterns. In other words, data depreciates in value unless it is maintained, hence it shares many of the same properties as perishable physical assets.
Without a well-accepted measure of the economic return to data and its applications, deciding how much to invest in digital initiatives, ICT capabilities, and other data assets is a guessing game. Companies need clearly defined objective metrics for gauging the financial value of their data. We propose to construct such metrics. The framework will be an econometric analysis of the financial value created by a company’s data assets, i.e., a data capital index (DCI). This DCI will include capture three distinct aspects: (1) data assets—how much data does the company have?; (2) data usage—how and how much does the company use its data?; and (3) data monetization—how does the company generate commercial value, i.e., earnings, from its use of the data?
Many attempts have been made to value information—all fall short. Market prices for commercial data such as customer profiles cover only a small portion of data in use. Consulting firms such as Gartner and McKinsey have applied traditional asset-valuation and IT productivity measures but these frameworks are based on a number of unsatisfactory assumptions, and also do not distinguish among different types of “digital capital” such as hardware, software, patents, and data. In the 1990s, IT consultants like Paul Strassman proposed used IT spending as the basis for calculating “information productivity”, but this is also a poor proxy because of the circularity of interpreting spending as productivity.
Resouce-based theory—the idea that durable competitive advantage emerges from unique combinations of resources that are economically valuable and difficult to replicate—is the leading academic framework for viewing the sources of firms’ competitive advantages. It convincingly addresses the complementarity of information technology (IT) and organizational processes, practices, routines, and activities. A dynamic capabilities framework extends the resource-based view to incorporate environmental and technological change, underscoring the importance of tangible and intangible “specific asset positions” in shaping firm resources.
However, few studies disaggregate IT investments by type, though several taxonomies for distinguishing different functional aspects of IT have been proposed. For example, building on earlier work by Orlikowski and Iacono (2001), Melville, Kraemer, and Gurbaxani (2004) consider five views of IT that characterize research studies in the management of information systems (MIS) literature:
1. Tool view: IT is an engineered tool that does what its designers intended, e.g., productivity enhancement and reshaping social relations;
2. Proxy view: IT is conceptualized by its essential characteristics, which are defined by individual perceptions of its usefulness or value, the diffusion of a particular type of system within a specific context, and its investment or capital stock denominated in financial units;
3. Ensemble view: IT is the interaction of people and technology in both the development and use of technology (this typically used for case studies examining IT business value within a specific organization);
4. Computational view: IT is algorithm and systems development, testing, and data modeling and simulation;
5. Nominal view: IT is invoked in name only, as an abstract factor of production, not in any specific form.
Aral and Weill (2007) provide a more practical disaggregation of IT assets into four distinct categories, each implemented to achieve particular management objectives: (1) IT infrastructure (provides the foundation of shared IT services); (2) transactional investments (automation to reduce costs or increase volume); (3) information investments (information provision for managing, accounting, reporting, and communicating internally and externally); (4) and strategic investments (repositioning the firm for new products or entry into new markets, services, or business processes).
Given the heterogeneous nature of IT assets, it is no surprise that valuing its components is a challenge. In fact, Melville et al. (2004) conclude that “IT is valuable, but the extent and dimensions are dependent upon internal and external factors, including complementary organizational resources of the firm and its trading partners, as well as the competitive and macro environment.” [emphasis added by A. Lo]. This suggests that, unlike gold, oil, or plant and equipment, data assets cannot be assigned a singular value that is independent of the company or its context.
The most direct approach for valuing IT assets used to date has been to rely on financial accounting measures. Masli, Richardson, Sanchez, and Smith (2011) propose return on assets (ROA) and return on sales to gauge the value of IT. These measures suffer from a number of limitations: (1) they include several confounding economic and competitive factors that cannot easily be controlled for; (2) they are affected by economy-wide and competitive constraints; and (3) they are backward-looking. In fact, Hitt and Brynjolfsson (1996) fail to find any relation between IT capital and accounting-based measures of profitability, and in Brynjolfsson, Hitt and Kim (2011) they are unable to make any inferences on whether the profit relationship is causal.
More recently, stock market performance has also been considered, including variants of Tobin’s q ratio which is meant to capture the business performance of intangible assets, tech assets, and brand equity. In a recent study, Saunders and Brynjolfsson (2015) replicate the result of an earlier study by Brynolfsson, Hitt, and Yang (2002) that $1 of computer hardware is correlated with more than $10 of market value (as measured by stock market capitalization). They account for the “missing $9” by broadening the definition of IT to include all capitalized software, and then include all purchased and internally developed software, other internal IT services, IT consulting, and IT-related training.
In addition to quantifying the magntidue of IT-intangibles, they also examine how the intangibles are distributed within their sample using the measurement of organizational IT capabilities used in Aral and Weill (2007). For a balanced panel of data from 127 firms over the period 2003–2006, which provides broader and more recent spending IT spending estimates, they find that the “invisible” IT not accounted for on the balance sheet is being priced into the market value of firms, suggesting that IT assets defined by accounting standards capture only a fraction of the business value of IT. Specifically, they conclude that there is a 45% to 76% premium in market value for firms with the highest organizational IT capabilities (based on separate measures of human resource practices, management practices, internal and external IT use, and Internet capabilities) as compared to those with the lowest organizational capabilities.
None of the papers in the current IT or MIS literature has focused on data as an asset or attempted to construct an index to measure its value.
The proposed framework is an econometric model in which company-specific estimates of the value of data can be derived. The starting point is the principle that the total value of financial claims on the firm should be equal to the sum of the firm’s physical, financial, and intangible assets:
Financial markets provide an important forward-looking measure of the value of intangible assets beyond book value and other accounting variables.
By using both commercially available financial accounting data and proprietary data from Oracle and a few key clients on firm-specific data assets and usage, the fraction of total stock market capitalization due to each type of asset—physical, financial, and intangible—can be inferred using a standard linear factor model:
where factors 𝐹1,…,𝐹𝑘 represent various types of assets other than data. 𝐷𝑖1,𝐷𝑖2, and 𝐷𝑖3 are three data-related factors constructed with proprietary Oracle and Oracle-client data, and they capture the amount of data assets, data usage, and data monetization of each company 𝑖. Depending on how much data is made available, nonlinear specifications and machine-learning techniques such as support vector machines, random forests, and deep-learning models may also be used to estimate these relations.
Data assets include data in all data management technologies (Hadoop, NoSQL, relational databases), in the cloud or on-premise, regardless of its purpose. It is a snapshot of what is, not what should be. A simple measure is data as an amorphous block, measured in petabytes stored on disk. More sophisticated measures would view data as an accumulation of discrete observations, measured in number of key/value pairs stored.
Data usage includes all queries executed in the data management tier across all types of repositories. A simple measure is how many queries executed during the time period under analysis. More sophisticated measures would distinguish between analytical queries (automated and ad-hoc, presumably invoked by people) and simple reads invoked by applications.
Data monetization includes measures that relate data to various measures of economic value-added. A simple measure is the current market value of comparable commercial data, or the net present value of potential licensing fees charged by the company for access to the data. More sophisticated measures would make use of forecasts of commercialization opportunities of data in various business lines.
Using this framework, the null hypothesis that $1 of an asset should contribute $1 to a firm’s market value can be tested by constructing multiple types of intangible assets—patents, R&D, advertising, and data—and include them in a market valuation equation. By adding physical and financial assets to the estimating equation, and controlling for industry and year effects, we can construct a firm-specific DCI that varies with market and business and then attribute market value to all the firm’s assets.
Constructing the proprietary data factors requires access to current and historical firm-specific data usage measurements taken within a firm’s own computing environment and can, therefore, only be done with its permission. Therefore, several key clients should be invited to participate in this study as strategic partners. The benefits of participation include: (1) early access to a new method of valuing data usage based on direct measurement; and (2) benchmarking against other participating peers.
Initial participants will be five to 10 companies in each of three industries, tentatively retail banking, retail insurance (life, auto, property & casualty), and industrial manufacturers. Companies should be among the largest in their industries by revenue, or among the fastest growing, and should be Oracle customers.
Several Oracle products already capture some of the data needed for this proposal, but for other purposes, we may wish to access data from Enterprise Manager (an on-premise application that monitors database and hardware performance for Oracle and non-Oracle technology) and Cloud Management Services (a new cloud-based set of offerings that incorporate Enterprise Manager capabilities, and add additional monitoring features for Oracle and non-Oracle cloud offerings). Oracle Public Cloud should be used to house the data and conduct analyses. It is not necessary to use only Oracle products; where necessary open-source products are fine as well.