Data Update 1 for 2024: The data speaks, but what does it say?

In January 1993, I was valuing a retail company and needed to determine a reasonable margin for a firm in the retail sector. To find an answer, I turned to Value Line, one of the pioneers in the investment data business, to calculate an industry average using company-specific data. The results were enlightening, revealing the distribution of margins and how understanding high, low, and typical values could enhance the valuation process. At that time, such information was scarce.

That year, I calculated industry-level statistics for five key variables frequently used in my valuations. Realizing there was no reason to keep these insights private, especially since I had no intention of becoming a data service, I shared them with my students. As the internet grew in prominence, I began sharing this data more broadly through my website.

Over the years, this practice has evolved into an annual ritual. With the increase in accessible data and more advanced analysis tools, those initial five variables have expanded to over two hundred, now encompassing all publicly traded companies worldwide, beyond the US stocks initially covered by Value Line. This wider scope has attracted more users than I ever anticipated.

While I still do not wish to become a data service, I recognize the importance of transparency in my data analysis processes. For the past decade, I have dedicated much of January to examining what the data reveals and obscures about the recent year's investment, financing, and dividend decisions made by companies.

In this first data post of the year, I will outline my data in terms of geographic spread and industry breakdown, the variables I estimate and report, the choices I make in my analysis, and provide caveats on the best uses and potential misuses of the data.

The Sample

There are many services, both free and paid, that report data statistics broken down by geography and industry. However, many of these services focus on subsamples, such as companies in widely used indices, large market cap companies, or only liquid markets. This approach is often justified by the fact that these companies have the most significant market weight and the most reliable information. Early in my career, I acknowledged the sense in this rationale but also recognized that such sampling, regardless of its good intentions, introduces sampling bias. For example, analyzing only companies in the S&P 500 might yield more reliable data with fewer missing observations, but it primarily reflects the behavior of large market cap companies within any given sector or industry, rather than the typical behavior of the industry as a whole.

Fortunately, I have access to comprehensive databases that include data on all publicly traded stocks. Therefore, I use the entire population of publicly traded companies with a market price greater than zero to compute all statistics. In January 2024, this universe included 47,698 companies, distributed across all sectors in the numbers and market capitalizations shown below:

Data on Damodaran Online

Geographical Distribution

These companies are incorporated across 134 countries. You can download a dataset listing the number of companies by country at the end of this post. For analytical purposes, I categorize these companies into six broad regional groupings:

United States
Europe (including both EU and non-EU countries, with a few East European countries excluded)
Asia excluding Japan
Japan
Australia & Canada (combined as a single group)
Emerging Markets (encompassing all countries not included in the other groupings)

The pie chart below illustrates the distribution of firms and their market capitalizations within each of these groupings:

Data on Damodaran Online

Geographical Categorization

Before addressing potential categorization discrepancies, I acknowledge several points that blend apologies with explanations. Firstly, these categorizations were established nearly two decades ago when I began working with global data. Many countries that were considered emerging markets then have since progressed into more mature market classifications. For instance, Eastern European countries that have adopted the Euro or experienced robust economic growth have been reclassified into the Europe grouping.

Secondly, these groupings serve as a framework for computing industry and global averages. Users are encouraged to utilize the average relevant to their own assessments. For example, if you are from Malaysia and believe Malaysia should be categorized differently than as an emerging market, you might consider using global averages rather than those specific to emerging markets.

Thirdly, the emerging market grouping has expanded significantly over time, encompassing most of Asia (excluding Japan), Africa, the Middle East, parts of Eastern Europe including Russia, and Latin America. Despite its breadth, I do provide specific industry averages for the two largest and fastest-growing emerging markets: India and China.

The Variables

As previously mentioned, the entire process of collecting and analyzing data is driven by my personal needs for corporate financial analysis, valuation, and investment assessment. I apply unique methodologies, including quirks in computing widely accepted statistics such as accounting returns on capital or debt ratios. For instance, I have consistently treated leases as debt in calculating debt ratios over the decades, despite accounting standards only adopting this practice in 2019. Similarly, I capitalize R&D expenditures despite this not yet being a universally accepted accounting practice.

In my corporate finance teachings, I categorize all corporate decisions into three buckets: investing decisions, financing decisions, and dividend decisions. My data analysis reflects this framework, and here are some of the key variables for which I compute industry averages on my website:

		Corporate Governance & Descriptive
		1. Insider, CEO & Institutional holdings
		2. Aggregate operating numbers
		3. Employee Count & Compensation

Investing Principle		Financing Principle		Dividend Principle
Hurdle Rate	Project Returns	Financing Mix	Financing Type	Cash Return	Dividends/Buybacks
1. Beta & Risk	1. Return on Equity	1. Debt Ratios & Fundamentals	1. Debt Details	1. Dividends and Potential Dividends (FCFE)	1.Buybacks
2. Equity Risk Premiums	2. Return on (invested) capital	2. Ratings & Spreads	2. Lease Effect	2. Dividend yield & payout
3. Default Spreads	3. Margins & ROC	3. Tax rates
4. Costs of equity & capital	4. Excess Returns on investments	4. Financing Flows
	5. Market alpha

Numerous corporate finance variables, such as equity and capital costs, debt ratios, and accounting returns, are essential components of my valuation process. Additionally, I incorporate specific variables that align more closely with my unique requirements for valuation and pricing data.

	Valuation		Pricing
Growth & Reinvestment	Profitability	Risk	Multiples
1. Historical Growth in Revenues & Earnings	1. Profit Margins	1. Costs of equity & capital	1. Earnings Multiples
2. Fundamental Growth in Equity Earnings	2. Return on Equity	2. Standard Deviation in Equity/Firm Value	2. Book Value Multiples
3. Fundamenal Growth in Operating Earnings			3. Revenue Multiples
4. Long term Reinvestment (Cap Ex & Acquisitons)			4. EBIT & EBITDA multiples
5. R&D
6. Working capital needs

Therefore, I calculate various pricing multiples based on different metrics such as revenues (EV to Sales, Price to Sales), earnings (PE, PEG), book value (PBV, EV to Invested Capital), or cash flow proxies (EV to EBITDA). In recent years, I have also incorporated employee statistics (number of employees and stock-based compensation) and measures of goodwill, not because it adds significant information but due to its potential impact on analysis integrity.

My data is primarily micro-focused, as other services excel in providing macroeconomic data (inflation, interest rates, exchange rates, etc.). While I appreciate resources like the Federal Reserve's FRED database for macro data, I estimate a few macroeconomic variables myself, mainly because they are less readily available or involve subjective estimation choices. For example, I provide annual historical returns on various asset classes (stocks, bonds, real estate, gold) dating back to 1928, aiming to ensure consistency in calculation methodologies across different asset classes.

Additionally, I calculate implied equity risk premiums, both forward-looking and dynamic estimates reflecting what investors expect stocks to earn in the future, for the S&P 500 annually since 1960 and monthly since 2008. I also compute equity risk premiums for various countries.

### Industry Groupings

While widely used industry codes such as SIC and NAICS exist, I opted to create my own industry groupings for several reasons. Firstly, I aimed to establish groupings that are intuitive for analysts seeking peer groups when evaluating companies. Secondly, I sought to strike a balance in the number of groupings: too few might hinder differentiation across businesses, while too many could result in groupings with insufficient firms, particularly in certain regions globally. I settled on approximately a hundred industry groupings, closely aligning with 95 distinct categories. The table below details the distribution of firms within each industry grouping in my data set:

### Data Handling and Methodology

When categorizing companies into industry groupings, I face inevitable questions about where individual companies belong. For instance, is Apple a personal computer company, an entertainment company, or a wireless telecom company? While each company could fit into multiple categories, for the purpose of computing industry averages, I assign each company to a single grouping. You can explore the detailed breakdown by clicking on this link, though please be patient as it's a large dataset.

#### Data Timing & Currency Effects

In computing statistics for each variable, my primary objective is to ensure they reflect the most current data available at the time of computation, typically in the first week of January. This approach can lead to timing disparities: metrics based on market data (such as costs of equity and capital, equity risk premiums, and risk-free rates) are updated to reflect values at the close of the previous year (December 31, 2023 for 2024 figures), while metrics using accounting numbers (like revenues and earnings) incorporate the latest quarterly filings. Despite this timing mismatch, this method aligns with my goal of using the most up-to-date data.

Handling multiple currencies poses another challenge when computing statistics across companies in different markets. The global database I utilize, S&P Capital IQ, allows me to aggregate all data in US dollars, facilitating cross-country comparisons. Moreover, since most statistics I report are ratios rather than absolute values, they are suitable for averaging across diverse geographical regions.

#### Statistical Choices

Transparency is crucial in my methodology. Some data items, such as stock-based compensation or employee numbers, may be absent due to varying reporting standards globally. In such cases, I report statistics only for companies that disclose this data.

In computing industry statistics over the years, I've grappled with the best approach to representativeness. For example, instead of a simple average which can be skewed by outliers or incomplete data, I often use an aggregated ratio approach. This method, like computing the PE ratio for software companies by aggregating market capitalization and earnings across all companies, including those with losses, reduces bias and better reflects industry norms. For comparison, I also provide conventional averages and medians for select variables.

#### Using the Data

The datasets I compile are tailored for real-time corporate financial analysis and valuation. If you're engaged in similar work, the 2024 data available [here](link) should prove beneficial. Archived datasets from previous years are also maintained on my webpage for researchers or appraisers needing historical context.

However, please note that my data is not suited for legal disputes or advocacy efforts, where selective use of statistics can skew interpretations. For company-specific details like Unilever's cost of capital or Apple's return on capital, you'll need to refer to their financial releases or other sources, as my datasets do not provide company-specific data due to restrictions from raw data providers.

### A Request for Sharing

In conclusion, if you find my data useful, feel free to use it responsibly. Take ownership of your analysis, and if you encounter any errors, please notify me so I can promptly rectify them. Sharing knowledge benefits everyone, and your feedback helps maintain the accuracy and utility of the data.

YouTube Video

Sample Breakdown

Country Breakdown

Links to my data

Data Update Posts for 2024