Linking Financial Data Sets:
Possible Problems and
Solutions
Sept. 30, 2014
Kathleen Dreyer, Columbia University
Todd Hines, Princeton University
Using financial data
Very common occurrence
o Financial data comes from multiple sources
Another very common occurrence
o The different sources use different identifiers
In order to link/merge the data together, common identifiers are
required.
Identifiers
Identifiers used to match data sets:
(ranked from less precise to more precise)
1. Name or ticker
2. CUSIP
a. SEDOL or ISIN for non-US companies
3. Unique id, e.g. GVKEY, PERMNO
Examples of financial data
Thom
Compustat data
ThomsonReuters data
Linking by company name
Source: Factiva.com
Linking on ticker symbol
Source: Bloomberg
Linking on CUSIP
Source: CUSIP Global Services (http://www.cusip.com/cusip/about-cgs-identifiers.htm)
Difficulties with CUSIP
Dell changed their name in July 2003 from Dell Computer
Corp to Dell Inc.
The ticker remained DELL but the CUSIP changed
Before name change CUSIP was 24702510
After name change (from 7/22/03) it was 24702R10
CUSIPs have only existed since the late 1960s.
o Can’t use to match for earlier data
Examples of unique identifiers
From CRSP
From Compustat
Let’s try an example!
A researcher would like to examine the
effectiveness of a merger.
What data do we need?
Image from: https://www.flickr.com/photos/hikingartist/
Effectiveness of mergers
To analyze, we’ll need
o Merger data (acquiror, target company, deal size, deal dates, etc.)
o Stock price information
Did the deal have a positive effect on the acquirer's stock price?
o Fundamentals (financial statement) information
Analyze the financials of the acquiror both and after the deal
Matching example
Unfortunately, all these variables come from
different databases. Fun!
Some common databases for:
M&A data -- SDC Platinum (a.k.a. Thomson ONE), Zephyr
Stock price - CRSP (US stock prices)
Fundamentals (financial statement data) - Compustat North
America (US company fundamentals)
Sara Lee (Hillshire Brands) - company history
Sara Lee (Hillshire Brands) selected M&A transactions
Why is it important we did this?
We would have thought the the SDC M&A deals were for
three completely different companies:
o Hillshire Farms
o Sara Lee
o Consolidated Food Corp
All three have different names, different CUSIPs, different
tickers
Will this solve all your problems?
Most companies will “match” if you use permanent
identifiers, but there still can be problems
o There is a lot of financial engineering now -
spinoffs, tracking stocks, complicated merger
deals, etc. Sometimes the researcher will need to
read up on the specifics of a deal in order to
match it correctly.
Matching CRSP to Compustat
A very common problem is to need to match CRSP stock data
to Compustat fundamentals (financial statement) data
Problem
o Compustat has a “unique identifier” GVKEY
o But the other company identifiers (name, CUSIP) are
what is called “header data”
Only the most recent value (name, CUSIP) is available
in Compustat
Matching CRSP to Compustat (cont.)
For Hillshire Farms (f.n.a. Sara Lee) this means that only the
most recent company name (Hillshire Farms) and most
recent CUSIP (the one starting with 432589) is in the
Compustat record.
There is a subscription database called CRSP/Compustat
merged.
o If you have a GVKEY, you can use the database to get the
Permno. And vice versa.
Most institutions don’t have a subscription, so you have to
do the matching yourself
Matching CRSP to Compustat
As when we were matching the SDC data,
having access to the CRSP file, which
details all company names, tickers and
CUSIP numbers used, will allow us to
match.
Sara Lee (Hillshire Brands) - company history
How researchers do matching
Researchers seldom look at one company at a time
Usually are looking at hundreds or thousands of companies
and deals
This means they’ll generally use statistical software such as
Stata, SAS, R, etc. to work with the data
o This software allows researchers to write programs that
will follow the same matching rules that we just
performed for one company, Hillshire Farms (Sara Lee),
but for many companies at once.
Questions?
Contact information:
Kathleen Dreyer: kd2145@columbia.edu
Todd Hines: thines@princeton.edu