Analyzing the Ages of Signatories to the Harper’s Letter and The Objective’s Counter-letter: A Case Study in OSINT Techniques
By Ishaan Jhaveri and Sara Sheridan
In early July, Harper’s Magazine published “A Letter on Justice and Open Debate,” a statement signed by 152 prominent authors decrying a “stifling” intellectual atmosphere and advocating for “the free exchange of information and ideas.” While the Harper’s letter did not mention any examples of the type of speech suppression it condemned, it came after a series of media industry controversies, most notably the resignation of a top New York Times editor over an op-ed advocating for military force on protesters. Predictable waves of support and backlash followed the publication of the piece, with voices from across journalism, academia, and Twitter opining about “the letter.”
Days later, and to less media fanfare, a counter-letter was published in The Objective, a Substack-based newsletter focusing on “how journalism has interacted with historically-ignored communities.” The counter-letter contextualized the concerns of the original letter within the current political climate, accusing its signers of confusing “discomfort in the face of valid criticism” with an attack on the free exchange of ideas.
While much has been written about the content of the two letters, we were curious about the signatories themselves. Age in particular has been a flashpoint in recent debates about free speech and changing editorial standards, so we thought that would be a good place to start. (Age also has the advantage of being an objective and quantifiable metric, unlike more complex demographic markers like race, gender, and class.)
We set out to find out the age of each person who signed the two letters using an open-source intelligence (OSINT) approach. OSINT, the practice of collecting data from publicly available sources, is a technique made popular recently by Bellingcat, a worldwide research collective that investigates anything from war crimes to disinformation campaigns by using information from publicly available sources and social media.
What follows is a description of our methodology and results, as well as a necessary discussion of the legal and privacy implications of such research. Our goal was not to draw any far-reaching conclusions about changing attitudes around free speech (after all, we are analyzing the signatories of two very specific public letters), but rather to use a popular media story as an entry point to ask some questions about the logistics and ethics of an OSINT approach.
We crystallized our findings into a basic statistical analysis and comparative visual representation of the ages of each letter’s signatories:
Harper’s Letter published on July 7th, 2020. Ages accurate as of July 10, 2020.
Harper’s Letter Statistics:
Total Signatories: 152
Named Signatories: 152
Signatories Represented Above: 136 (89.5% of Total)
Mean Age: 59
Median Age: 60
Min Age: 24
Max Age: 91
Mode Age: 70 (8)
(note sample size ≠ population size, so min, max and mode might not be correct)
The Objective’s Counter Letter published on July 10th, 2020. Ages accurate as of July 14, 2020.
The Objective’s Letter Statistics:
Total Signatories: 164
Named Signatories: 139
Signatories Represented Above: 111 (79.9% of Named Population, 67.7% of Total)
Mean Age: 33
Median Age: 31
Min Age: 19
Max Age: 56
Mode Age: 27 & 30 (8)
(note sample size ≠ population size, so min, max and mode might not be correct)
Age Comparison Graph
Age Comparison Graph Highlighting Rough Generational Demarcations
Source for Generational Demarcations: Pew
Here’s how we collected the data:
There are a combined total of 291 named signatories to both Harper’s “A Letter on Justice and Open Debate” (152 total and named) and The Objective’s “A More Specific Letter on Justice and Open Debate” (164 total, 139 named). We were able to find the age in years (as of July 10, 2020 for Harper’s letter and as of July 14, 2020 for The Objective’s letter) of 247 of these 291 people using various OSINT techniques. The techniques can be divided into 4 broad categories:
- Full birth date from independent public source
- Birth date from self-reported public source
- Age from public records search
Full Birth Date From Independent Public Source
This is the most straightforward category. We found 102 people’s full birth date down to the day on Wikipedia, 1 person’s birth date down to the month on Wikipedia and 3 people’s full birth dates from encyclopedia.com, peoplepill.com and nonbinary.wiki, respectively. This category accounted for 106 people.
Birth Date From Self-Reported Public Source
Some people self-report their birthdays publicly on Facebook or Twitter, or post about their age on a given birthday. For this category, knowing how to look is key. If the person’s Twitter handle was @ishaan_jhavs, we would search for things like:
15..40 “today” site:twitter.com/ishaan_jhavs [where 15 and 40 represent the lower and upper bounds of numbers being searched for. This might unearth a tweet like, “I’m officially 30 today!”]
We searched for a wide variety of keywords on people’s Twitter handles, Facebook pages, personal websites and personal blogs.
For example, one signatory tweeted about turning a certain age over the upcoming weekend. Because we know the exact date of the tweet and can therefore narrow down the person’s Xth birthday to one of two days (and thereby also their full birth date to one of two days), we can be sure of their age in years in July 2020.
We were careful to only count in this category people whose age in years in July 2020 we could clearly discern from information they self-reported online. If someone wrote something like, “… in 2018, when I turned 40… ” we wouldn’t include their information (because based on this alone, they could be 41 or 42 in July 2020.)
In one case, a person reported their age in their Twitter bio, but we wanted to be sure that it was up-to-date as of July 2020. Archived snapshots of their Twitter page showed that they had incremented the age in their bio 2 months ago. Though it is possible that they updated their age well after their birthday, this update was enough for us to conclude that their self-reported age in their bio was their current age in July 2020. This case highlights that discretion is critical to open source data gathering.
We found 7 people’s birth dates on Twitter, 1 on Facebook and 1 on a personal blog. This category accounted for 9 people.
Age from public records search
Many people’s age in years is available through free public records. But using these types of records as a source presents the challenge of verifying that the person returned in a given search is really the person you seek. Think of how many John Smiths, of all ages, exist in different cities across the United States alone. In our case, how would we know which John Smith is the one that signed one of the letters?
To determine this, you need to be able to find pieces of information that pertain to the signatory (“target person”) and pieces of information that pertain to a person returned as a result of a public records search (“searched person”). Only once you ensure that enough of the “target” pieces of information match the “searched” pieces of information, can you trust that they are the same person, and therefore that the age listed for the searched person is the age of the target person.
We used two public records search engines for U.S. public records: searchpublicrecords.com and publicrecords.searchsystems.net. Both return some or all of the following bits of information about a person you search for:
- Full name with middle name or initial
- Other known names
- Cities and states registered in
- Possible relatives
We then used 5 different sites to cross-check the data that we found (unless the public records search result was sufficiently specific). So the 6 ways of finding age information through public records search were:
- Birth Year on Wikipedia, Current Age through Public Records Search
- Past Age Benchmark from Graduation Year, Current Age through Public Records Search
- Past Age Benchmark from Miscellaneous Public Information, Current Age through Public Records Search
- Past Age Benchmark from Article, Current Age through Public Records Search
- Self-Reported Past Age Benchmark, Current Age through Public Records Search
- Very Specific Result from Public Records Search
- Birth Year on Wikipedia, Age through Public Records Search
We found 15 signatories’ birth years (without specific month or day) on Wikipedia. Since we knew their birth year we knew they could only be X or X-1 years old in July 2020, where X is the number of years between their birth year and 2020. For many of these people, Wikipedia or other online sources listed a city they had lived in or their full name including middle name. If we could find a searched person with the same full name or who was registered in one or more cities that we knew the target person would have spent time in from information about them on Wikipedia or other online sources, and whose age was either X or X-1 years, we felt confident that the searched person was the target person and therefore the searched person’s age was the target person’s age. We only treated searched person as target person when at least one variable other than age being X or X-1 years matched.
For example, the year that a specific target person was born according to Wikipedia was X years before 2020, and the searched person is listed as being X years old. Two additional variables connecting the two is that the searched person was at one point registered in the city that the target person’s alma mater (as per Wikipedia) is located in, and in the city that the target person’s current employer is located (as per Wikipedia).
Each match of searched person to target person was unique. In the example discussed above, the age made sense, there was only one searched person with the same name as the target person, and they had two cities in common. With three layers of confirmation, we felt confident in our data gathering on this target person.
2. Past Age Benchmark from Graduation Year, Current Age through Public Records Search
There were 97 signatories for whom we followed a similar strategy of narrowing the range of their likely age and then attempting to locate their exact record within public records search results. For these 97, we didn’t have an exact birth year as we did for the 15 above, but we were able to benchmark their age using either their high school or undergraduate graduation year.
For example, we first located a certain target person’s LinkedIn page. From the LinkedIn page we found their undergraduate graduation year to be 2001, meaning they are likely to have been in their early 20s that year. We also found two cities that they have likely spent time in. (For a brief discussion of legal issues related to the publication of LinkedIn information, see the conclusion.)
We then searched for their full name on a public records search in the state that one of these cities is in, and found 4 people with the same name. Their ages were 31, 41, 48, and 50 respectively. It’s far more likely that, of these people, the person who is 41 today, and was approximately 22 in 2001, would have graduated from an undergraduate program in 2001. Moreover, the 41-year-old in the search results was registered in a city we knew the target person worked in (from LinkedIn). Finally, when looking up the same name in public records of the state of the other city the target person was likely to have spent time in, the only result returned matched the 41-year-old in the first search. Given that the searched person is registered in two states that the target person likely spent time in, and that they appear likely to have been at the right age to graduate from an undergraduate program in 2001, we concluded that they are likely to be the target person, and are thus 41 years old as of July 2020.
We only confirmed searched persons as target persons when at least one variable other than likely age based on undergraduate or high school graduation year matched (though it was usually two or more). Our source for a person’s undergraduate or high school graduation year was usually either LinkedIn, or a personal bio found on the website of an organization they were employed by or affiliated with.
With all these sources we had to ensure that the LinkedIn page or bio corresponded to the correct signatory. For the example above, we were able to verify that we had the correct LinkedIn page because it lists a book that the person wrote, which they also list in their Twitter bio, which in turn we knew belonged to the correct signatory because the owner of that Twitter account tweeted about signing one of the letters. The LinkedIn page also lists them as a freelancer, which is how they are classified next to their name on the letter they signed.
3. Past Age Benchmark from Article, Current Age through Public Records Search
There were 5 signatories whose ages we were able to benchmark not through their birth year or their high school or undergraduate graduation year, but through articles about them that mentioned their age.
For example we found an article written about a target person in September 2014 which says they were X at the time of writing, which means in July 2020 they would either be X+5 or X+6. Once we found this age benchmark, the next step to determine their current age was the same as for the above subcategories.
4. Self-Reported Past Age Benchmark, Current Age through Public Records Search
3 signatories tweeted their age (but not birthday) at a given time, which we used as a benchmark to then find their current age on a public records search.
For these subcategories, we still relied on multiple variables to confirm that the target person and the searched person were the same. The age benchmarks gave us a 2-year likely age range in July 2020. The public records search result confirmed the exact age.
5. Past Age Benchmark from Miscellaneous Public Information, Current Age through Public Records Search
For 2 signatories, we weren’t able to find their graduation year, but we did find evidence of their being undergraduates in a certain year which gave us a wider age range, which meant we had to match more variables to ensure that target and searched person were the same.
For example a target person listed a university on LinkedIn as their undergraduate alma mater without listing a date of graduation, but listed themselves as a staff writer for that university’s newspaper from 2008 to 2011, which means the earliest they could have graduated is in 2011.
We then found their high school and hometown listed on a university website, as well as tweets written by the target person about being an alumnus from that high school. We found their current city of residence on their LinkedIn page.
We searched for their name on searchpublicrecords.com and found that only one of the searched people was the correct age to have been in their early 20s in 2011 and was also registered in both the hometown (which has a small population) and current city.
6. Very Specific Result from Public Records Search
There were 9 signatories for whom we couldn’t find any benchmark age, but there was only one result from the public records search with the full name matching the target person’s full name. Often these names were distinctive or the full name including middle name or initial matched. In these cases, if we could find other matching variables between the target person and the searched person, even though we had no benchmark to go on, we assumed that the age of the searched person was the age of the target person.
For example, we found one target person’s page on the website of the organization the person is affiliated with according to their signature on one of the letters. From this page, we found that the person attended two universities in two different states. Next, from their LinkedIn page, we found the area they are currently based in.
When we search their name on searchpublicrecords.com, the only returned person was registered in the target person’s current state of residence and in 2 cities that corresponded with the 2 universities the target person attended. This was enough for us to confirm that this searched person was our target person.
Determining how many variables need to match between target and searched person before we can be sure they are the same is not an exact science. Highly specific matches (like both searched and target person being registered in a small town) are more valuable than weaker matches (like a common first and last name).
This category accounted for 131 signatories.
We couldn’t find the exact age in years in July 2020 of 45 of the 291 named signatories using any of the above methods. For one of them, we figured out their age using a variety of disparate sources.
The target person is affiliated with a specific publication according to the letter they signed. We found a Twitter account belonging to someone with the same name whose bio mentions the same affiliation. From this account, they tweeted on their birthday, without saying how old they were turning. They also had previously linked to articles they had written for a different publication. When this second publication announced that it was hiring the target person in 2017, it published a small blurb about the person which mentioned a previous role at a third publication. Finally, we found a July 2014 article that interviewed this person when they were writing for this third publication which mentioned their age. Because we knew their age in July 2014 and their exact birthday, we could be sure of their exact age in July 2020. We were sure that the article referred to the same person because we could track how they moved from publication to publication until they came to be affiliated with the publication they are affiliated with on the letter they signed.
This category accounted for just this 1 person.
For the remaining 44, there were some target people for whom we found searched people that were likely to be the same person, but we couldn’t match enough variables. For some, we figured out the year they were born in, but couldn’t get their exact age in years in July 2020. For others, we found nothing. We ended up excluding these 44 (16 from Harper’s letter, 28 from The Objective’s counter letter) because we were unable to confirm their ages with the same level of confidence as for the other 247.
The graphs above present data about these 247 people.
This case study in using OSINT methods for a relatively simple data analysis project raises a number of legal and ethical questions.
First, OSINT and privacy are two sides of the same coin: OSINT is often successful only because privacy-compromising information is accessible about people online. In our case, we felt that since these 291 people named themselves as signatories of a public letter, they were choosing to enter the public sphere and open themselves up to a certain level of scrutiny. And as far as demographic information goes, age is not as sensitive as race, gender, or family-related information. Nevertheless, the process described above serves as a reminder about how much data people publish about themselves online, and it’s not difficult to imagine OSINT techniques venturing into much more ethically murky waters.
It may seem far-fetched that simply publishing already publicly available data could be controversial, but often such information can be used to verify other details about people that they may not have intended to make public. Anticipating legal and ethical gray areas like this is vital to conducting OSINT research responsibly.