Ethics in Data-Driven Storytelling

Introduction

Ethics refer to the moral principles or values held or shown by an individual person. The term comes from the Greek “ethos,” which in turn refers to a person’s character.

Ethics are intended to help resolve questions dealing with what is right and what is wrong. Ethics ultimately reside at the individual level — that is, they reflect what an individual considers to be acceptable behavior. However, such moral principles are shaped by one’s societal and cultural norms, religion, and even familial environments. In the case of journalism, there are also specific professional codes of ethics that journalists must abide by.

Laws vs. Ethics

Ethics are very different from laws.

Philosophically, the law is typically concerned with what is legal or illegal, while ethics are concerned with what is right and what is wrong. These differ substantially, as something may be legal yet arguably unethical (e.g., enacting a death penalty) and illegal yet arguably ethical (e.g., stealing a loaf of bread to feed a hungry child). Additionally, laws are usually determined by institutions (e.g., a state government) and enforced through institutions (e.g., the police), whereas ethics are typically self-legislated (e.g., within groups or individuals) and self-enforced (e.g., through social pressure or exclusion). Finally, legality is based on statutory boundaries that are supposed to apply equally to all members of a jurisdiction, and ethics are more ambiguous and may vary considerably according to members of a group.

A simpler way to think about this, however, is that laws set a minimal standard, whereas ethics set a benchmark or ideal behavior to strive toward. Put another way, laws are about what you can do, and ethics are about what you should do.

Journalistic ethics are especially important in the United States because there is no licensing system for U.S. journalists. Anyone can claim to be a journalist, which is very different from professions like doctors and lawyers that require formal credentialing. This is not the case everywhere, either. Some countries require journalists to be licensed by the government in order to publish journalism.

In lieu of licensing, self-regulation becomes important for promoting good journalism — both in terms of products and behaviors. The perception that journalism is both good and intends to do good is important for its recognition as a pillar of democratic society. Put another way, a strong sense of professional ethics is important for gaining the public’s trust.

A Spectrum for Ethics

There are different philosophical approaches for determining what is ethical and what is not. Placed on a spectrum, we’d likely find deontological approaches on one end and teleological approaches on the other.

Deontological approaches focus on the principles that drive the action. Put another way, even if the consequence of an action is bad, it would be moral if it was driven by good motives and followed best practices. An example of this approach is Immanuel Kant’s categorical imperative approach, wherein the ethical duty is the same all of the time, in every circumstance, and with little regard for the consequences. Under a deontological approach, for example, a reporter would refuse to go undercover and lie about their profession because lying is wrong, even if it means missing out on an important story about water contamination.

Teleological approaches focuses on the result of the action. Put another way, if the goal is “good,” then the action is moral, with little weight placed on how one reached that goal. An example of this approach would be Utilitarianism, which determines the ethical act to be that which brings the greatest good to the greatest number of people. Under a teleological approach, for example, a reporter would agree to go undercover and lie about their profession because a larger group of people — presumably, most members of that city — would benefit from the story about water contamination than would be harmed by the lying.

To be clear, there is a vast middle ground between these approaches. Other approaches include situational ethics, multiple duties, and virtue ethics.

Data and Codes of Ethics

The most influential code of ethics in the U.S. is the Society of Professional Journalists’ (SPJ) Code of Ethics, from which most other professional and organizational journalistic codes often borrow. The SPJ code is divided into four main ethical principles: seek truth and report it, minimize harm, act independently, and be accountable and transparent. These principles sometimes clash with one another, requiring journalists to balance which principles are most important under their personal ethical philosophies. SPJ’s Code of Ethics includes a series of detailed statements for each principle, which is intended to guide action for specific kinds of dilemmas.

While several of the SPJ Code of Ethics’ principles apply to data journalism, there are also important issues specific to this genre (or approach to journalism) that complicate the ways in which those principles ought to be applied or balanced. In particular, the ethical issues most likely to arise in data-driven stories center on accuracy and balancing the right to privacy against the public interest.

Being Mindful of Accuracy

Numbers, charts, and maps possess an air of authority that other types of information often lack. And, yet, such objects can be just as easily manipulated and misunderstood — not to mention that datasets often have significant limitations of their own. Journalists must therefore be careful in how they present information so as to not make them seem more valid (or authoritative) that they actually are.

While journalists should strive to make their stories accessible, they must also be sufficiently precise in their language so as to not introduce confusion (or make it easy for information to be misinterpreted). This requires journalists to be well-versed in numeracy and the statistical concepts they incorporate in their work. For example, if a journalist intends to use percentages in their work, they should understand the difference between a percentage increase and an increase in percentage points. Similarly, if they intend to include the results of a statistical analysis, they should understand the assumptions and limitations of that analysis. If the journalist does not understand an analysis, they should omit it from their report (or confirm it is correct with someone who has the requisite expertise).

Ethical journalists must also ask the same questions of a dataset that they would ask of any other source: Does this source have a vested interest? How was this information collected? What (or who, where, or when) might be missing from these data? What is the margin of error for these data (among other potential limitations)? Can I verify this information (or interpretation) against another, independent source? Put simply, journalists must trust the source and validity of the data they are using. If there is any uncertainty, it should be noted.

When presenting the data, it is important to keep in mind that accuracy is only a precursor to the truth, and not truth itself. Put another way, ethical journalists should contextualize the information in a way that helps the audience understand what it represents. For example, is a large number higher or lower than is expected? How does it compare to other like things? What is the trend line? By putting data into context, the ethical journalist makes them more meaningful.

Similarly, journalists must be especially careful with their visuals. As scholars like Alberto Cairo have thoughtfully pointed out, data visualizations are especially subject to misrepresentation. For example, a bar chart that is not zeroed on the Y axis may suggest a huge discrepancy between two bars due to the spatial difference (e.g., the latter is twice as high) — even if the numeric difference is actually quite small (e.g., 1 percent difference). Similarly, a 3D pie chart can be used to distort the apparent proportions of each slice. Thus, ethical journalists must be versed in design principles and seek to promote the most truthful visual representation of data.

Balancing Privacy Against Public Interest

Data about individuals often raise privacy questions. This is especially true in data journalism, where there is a cultural bias toward transparency.

For example, a journalist may gain access to a database of gun permits that lists each gun owner’s name and location. That journalist may then want to create a map showing were each owner lives in order to inform members of that community, especially after a major incident involving gun violence (e.g., the Sandy Hook massacre). This may be seen as journalism that is both transparent (i.e., it shows all the data instead of just telling the journalist’s interpretation) and serves the public interest. However, such a map is also a violation of individuals’ expectations of privacy as such data are typically only collected by state authorities for the expressed purpose of helping them solve crimes. Moreover, and especially after a major incident, such data can arguably put the gun owners at risk of being victims of crime themselves, regardless of how responsible they have been. This introduces an ethical dilemma.

Ethical journalists should consider their responsibilities for providing context and ensuring that the data are correct — which may even necessitate providing updates to the data. In particular, ethical journalists should consider the level of detail that is actually required to tell the story. In the aforementioned example, it is likely not necessary to pinpoint the exact houses where gun owners reside. Instead, those data could be aggregated to the neighborhood level in order to produce a heatmap or a choropleth map. In fact, aggregate, and less personal, information can sometimes provide a clearer picture of the broader trends that are of interest to a general audience.

When publishing raw datasets, journalists may not be able to check every data point — especially if the dataset is large. Nevertheless, ethical journalists should try to offer some mechanism that allows users (especially those who are most affected by an error) to make corrections or easily notify organizations about mistakes. One easy way to do this is to post the dataset in a distributed repository like GitHub, which offers features allowing individuals to make moderated changes to data files or flag issues.

Additionally, data journalists must also be cognizant of the fact that “anonymized” data can sometimes be deanonymized by cross-referencing it against other publicly available data. For example, in the mid-1990s, the Massachusetts Group Insurance Commission began disseminating data that showed every single hospital visit by a state employee. The state removed personal identifiers like each person’s name, address, and Social Security number, and mostly shared the data with researchers. However, an enterprising graduate student obtained the entire dataset through a records request and illustrated how easy it was to identify all the hospital visits by then-Governor William Weld. The student knew Weld resided in Cambridge and thus simply had to pay $20 to get a copy of the voter roll in Cambridge, which included each voter’s name, Zip code, birth date, and sex. By cross-referencing those general identifiers with the GIC data, the student was able to identify all of the health records pertaining to Weld.

Journalists must also take care to be careful about protecting sources when publishing leaked data. In particular, many files contain metadata that can be used to track down individuals, such as dates, locations of access, and even the accounts used to access or produce the material. Such metadata can be used to identify the source.

Making the Ethical Choice

It can be difficult to ascertain the “right” answer to an ethical conundrum — and, sometimes, there is no single “right” answer. However, ethical journalists should be mindful of recurring issues within their genre or beat, and be aware of the best practices recommended by their profession’s codes of ethics.

When in doubt, consult your professor, supervisor, or trusted colleagues. Sometimes, an outsider perspective is the most needed one.