晕水症最可怕的测试图48张
by Michelle Jones
由米歇尔·琼斯(Michelle Jones)
如何设计可怕的图 (How to design terrible graphs)
警告:包含图表暴力 (Warning: contains graph violence)
Graphs are used to present information in a visual, summary format. They can be used instead of tables. Used successfully, graphs reduce the amount and complexity of data used in sentences. Hopefully this article gives you extra tools for deciding what graphs (not) to use.
图形用于以可视的摘要格式显示信息。 可以使用它们代替表。 成功使用图形可以减少句子中使用的数据量和复杂性。 希望本文为您提供了额外的工具来决定使用哪些图形(不使用)。
The person who has worked hardest and longest in the area of graph design is Edward Tufte. I have included a link to his website under Resources.
在图形设计领域中最辛勤和最长的工作是爱德华•塔夫特 ( Edward Tufte) 。 我在参考资料下包含了指向他的网站的链接。
Anyone who knows me well also knows two key pieces of information. I hate pie charts and I hate poorly made bar charts. I have taken charts from publicly available reports to illustrate my points. I’ve also pulled the examples from different disciplines, to show that poor chart design is everywhere.
完全了解我的人也知道两个关键信息。 我讨厌饼图,讨厌讨厌的条形图。 我从公开报告中提取了图表来说明我的观点。 我还从不同学科中抽取了示例,以表明不良图表设计无处不在。
Finally, I have purposely chosen reports where the chart designer is not identified, or there are multiple authors. The purpose of this article is not to name and shame individuals, and the designer does not normally have much of a say in the publication approval process. Managers and/or peer reviewers have decided that these graphics were fine to use.
最后,我特意选择了无法识别图表设计者或有多个作者的报告。 本文的目的不是要给个人起名和羞辱,而且设计者通常在发布批准过程中没有太多发言权。 经理和/或同行评审认为这些图形很好用。
饼状图 (Pie charts)
简单的饼图 (Simple pie charts)
The purpose of pie charts is to show how mutually exclusive, related categories each contribute to the information about that category.
饼图的目的是显示相互排斥的相关类别如何分别有助于该类别的信息。
Let’s start with a simple example. Below is a pie chart containing just two categories: male and female. Pie charts are often used to show the ratio of sex, for example when reporting the results from surveys.
让我们从一个简单的例子开始。 下面的饼图仅包含两个类别:男性和女性。 饼状图通常用于显示性别比例,例如在报告调查结果时。
But why use a pie chart for a binary classification?To reiterate, the categories are mutually exclusive. We could just say 49% of the books reviewed had female authors. That 51% were by male authors is easy to assume, and calculate.
但是,为什么要使用饼图进行二进制分类呢? 重申一下,类别是互斥的。 我们可以说,有49%的书是女性作家。 男性作者占51%,这一点很容易假设和计算。
The point of the website is to highlight the lack of reviews for books with female authors. If you go to the link, you’ll see a series of 14 pie charts, one for each newspaper assessed by the Stella Count, for 2013. Even with a large screen, you’ll be scrolling to see all of them. And the pie chart for The Monthly has the colour of the categories reversed — it’s hard to keep track of consistent formatting for so many charts!
该网站的重点是要强调对女性作家的书籍缺乏评论。 如果您转到该链接,则将看到一系列14个饼图,其中每张由斯特拉伯爵(Stella Count)评估为2013年的报纸。其中即使有大屏幕,您也将滚动查看所有这些饼图。 而且,“月度”饼图的类别颜色相反-很难跟踪这么多图表的一致格式!
I think the information would be better presented in a bar chart. I’ve used R for this. The packages I’ve called are ggplot2 and ggridges. ggridges has been used to cycle the two colours through the bars. I think the colour cycling improves the readability of the graph compared to only having one colour for every bar. There was a hiccup I can’t fix, with the colour cycling towards the bottom, so I have forced a reverse order for two bars using FillValues.
我认为以条形图更好地呈现这些信息。 我为此使用了R。 我所说的软件包是ggplot2和ggridges 。 ggridges已用于使两种颜色循环通过条。 我认为,与每个色条仅使用一种颜色相比,颜色循环可以提高图形的可读性。 我无法解决FillValues问题,颜色向底部循环,因此我使用FillValues强制反转两个条形。
FemaleAuthors <- data.frame(Publication=c("The Advertiser", "The Age", "Australian Book Review", "The Australian Financial Review", "Books+Publishing", "The Courier-Mail","The Daily Telegraph", "Good Reading", "The Monthly","Sunday Age","Sunday Tasmanian", "The Sydney Morning Herald","The Weekend Australian", "The West Australian"), PropOfFemales=c(49,42,47,15,61,41,46,49,41,49,49,43,35,58))FemaleAuthors <- FemaleAuthors[order(-FemaleAuthors$PropOfFemales, -FemaleAuthors$Publication),]FemaleAuthors$FillValues <- c(rep(c("A","B"),5),"B","A","A","B")library("ggplot2")library("ggridges")ggplot(data=FemaleAuthors,aes(x=reorder(Publication, PropOfFemales), y=PropOfFemales, fill=FillValues)) + geom_bar(stat="identity", colour="black", width=1) + scale_y_continuous(breaks=seq(0, 70, by=5), limits=c(0,70), expand=c(0,0)) + scale_fill_cyclical(values=c("plum3","orchid2"))+ labs(x="Publication", y="Proportion of books reviewed \nwith female authors")+ coord_flip() + theme(panel.grid.minor.y=element_blank(), panel.grid.major.x=element_line(color="gray"), panel.background=element_blank(), axis.line = element_line(color="gray", size = 1), axis.text=element_text(size=10), axis.title=element_text(size=15), plot.margin=margin(5,5,5,5), legend.position = "none")The important information from those 14 pie charts — the representation of female authors in newspaper book reviews — is now obvious at a glance.
这14个饼图中的重要信息-报纸书评中的女性作家代表-现在一目了然。
For ease of interpretation, I have colour coded the bars with pinkish shades. (Yes, this is a stereotype, but the pink drives home the point that these are the results for females). The alternating colours make it easier for the eye to trace along each bar. I’ve graphed the data by descending female representation, reinforcing the point of the Stella Count.
为了便于解释,我对条形进行了粉红色色编码。 (是的,这是一种刻板印象,但是粉红色代表了这些是针对女性的结果)。 交替的颜色使眼睛更容易沿着每个条形走线。 我通过降低女性代表来绘制数据图,以增强Stella Count的意义。
While the exact proportions cannot be read from the graph, the grid line at each 5% provides a sense of the number. Important numbers can be mentioned in the text.
虽然无法从图形中读取确切的比例,但是每5%的网格线提供了数字感。 重要的数字可以在文本中提及。
更复杂的饼图 (More complex pie charts)
The pie chart below has a lot of slices, and relates to gene expression. Only three of the slices are large enough to contain text. Each category is tagged with its respective proportion.
下面的饼图包含很多切片,并且与基因表达有关。 只有三个切片足够大以包含文本。 每个类别均以其各自的比例进行标记。
One category, “Miscellaneous Function”, contained no altered genes, and is shown adjacent to the pie chart. It’s hovering in space. However, because that function is sitting next to the purple slice, a quick glance suggests that it relates to that slice. The line to “Nucleic Acid Regulation” shows the actual category, but not all the slices have lines linking the category.
一类“其他功能”不包含任何已改变的基因,并显示在饼图的旁边。 它徘徊在太空中。 但是,由于该功能位于紫色切片的旁边,因此快速浏览一下就表明它与该切片有关。 “核酸法规”行显示了实际类别,但并非所有切片都具有将类别链接的线。
Again, I can construct a bar chart because all the data is included in the original graphic. Using R, and the RColorBrewer package to get more colors than are contained in Set3:
同样,我可以构造一个条形图,因为所有数据都包含在原始图形中。 使用R和RColorBrewer包获得比Set3中包含的颜色更多的颜色:
GeneExpressionProfile <- data.frame(AlteredGenes=factor(c("Apotosis-associated","Cellular Maintenance & Signalling", "Chitin Binding","Detoxification","Insect Digestion-related", "Insect Growth","Insect Immunity", "Insect Metabolism", "Miscellaneous Function","Nucleic Acid Regulation", "Stress Response","Virus Replication / Altered Host Physiology", "Unknown")), PercentAltered=c(1,10,2,4,25,2,4,10,0,5,1,2,34))GeneExpressionProfile <- GeneExpressionProfile[order(-GeneExpressionProfile$PercentAltered),]library("ggplot2")library("ggridges")library("RColorBrewer")ggplot(data=GeneExpressionProfile,aes(x=reorder(AlteredGenes, PercentAltered), y=PercentAltered, fill=AlteredGenes)) + geom_bar(stat="identity", colour="black", width=1) + scale_y_continuous(breaks=seq(0, 50, by=5), limits=c(0,50), expand=c(0,0)) + scale_fill_manual(values=colorRampPalette(brewer.pal(12,"Set3"))(13)) + labs(x="Gene Group", y="Proportion of altered genes \nacross the genes studied")+ coord_flip() + theme(panel.grid.minor.y=element_blank(), panel.grid.major.x=element_line(color="gray"), panel.background=element_blank(), axis.line = element_line(color="gray", size = 1), axis.text=element_text(size=10), axis.title=element_text(size=15), plot.margin=margin(5,5,5,5), legend.position = "none")Produces the following bar chart
产生以下条形图
条形图 (Bar charts)
As you can see, I really like bar charts. However, there are a number of ways to make bar charts less interpretable. These are stacked bar charts.
如您所见,我非常喜欢条形图。 但是,有很多方法可以减少条形图的解释性。 这些是堆积的条形图。
堆积条形图 (Stacked bar charts)
One type of stacked bar chart uses proportions, so each the components inside each bar sum to 100%. These can be visually complex, and the messages from the chart are not always clear to a reader.
一种堆叠条形图使用比例,因此每个条形图内的每个组件的总和为100%。 这些在视觉上可能很复杂,并且图表中的消息并非始终对读者清晰可见。
Additionally, because all the bars are forced to be the same length, differences in the numbers that underlie the proportions are masked. It could then be misleading to compare the relative proportions across the bars.
此外,由于所有条都必须具有相同的长度,因此掩盖了比例的数字差异。 这样比较各条形图的相对比例可能会产生误导。
A factor that accounts for 30% of a bar may not be interesting if the result relates to three out of ten people. Our interpretation of the importance would change if the same percentage was based on 200 people.
如果结果与十分之三的人相关,那么占酒吧30%的因素可能不会引起人们的兴趣。 如果相同的百分比基于200人,则我们对重要性的解释将发生变化。
Another, less complicated example is below. There are two main problems with this graphic. First, the bars include the percents. This is an admission that people can’t interpret the values from the length of the bar sections. If you click on the link (in the caption), you will find that allthe percents are listed, for allyears, on the same page underneath the chart.
下面是另一个较简单的示例。 此图形有两个主要问题。 首先,条形图包含百分比。 这是承认人们无法从条形段的长度来解释值。 如果单击链接(在标题中),则会发现所有年份的所有百分比都在图表下方的同一页上列出。
Why is this bad? All the information in the chart is duplicated in the text. Why include the bar chart?
为什么这样不好? 图表中的所有信息均在文本中重复。 为什么要包括条形图?
The use of numbers inside the bar sections seems to be relatively common. Another example is below. Here, they have used a light-to-dark color scheme for each section. I think gradient color schemes make charts harder to read. Gradient color schemes are also hard to interpret when the bars aren’t stacked.
条形部分内部使用数字似乎比较普遍。 下面是另一个示例。 在这里,他们为每个部分使用了从浅到深的配色方案。 我认为渐变配色方案使图表更难以阅读。 当条形图没有堆叠时,渐变色方案也很难解释。
The other type of stacked bar chart is one where the bar sections take on their true values. This results in bars of different heights. The advantage is that we can see the actual numbers. However, the chart still contains a lot of information, and only the largest changes in categories are obvious.
堆叠条形图的另一种类型是条形图取其真实值。 这导致不同高度的条形。 好处是我们可以看到实际数字。 但是,该图表仍然包含许多信息,并且只有类别中最大的变化是显而易见的。
特别提及:3D图形 (Special mention: 3-D graphs)
I have put these into a separate section to show that 3-D is not a good decision for charts.
我将它们放在单独的部分中,以表明3-D对于图表而言不是一个好的决定。
3-D饼图 (3-D pie charts)
The only thing worse than a 2-D pie chart is a 3-D pie chart.
比2D饼图更糟糕的是3D饼图。
The relative size of the pieces is even more difficult to interpret. Because the chart is in 2-D space, slices become inaccurate. Let’s use the bottom chart as our example. I’m rounding to the nearest million in each example.
碎片的相对大小甚至更难以解释。 由于图表位于二维空间中,因此切片变得不准确。 让我们以底部图表为例。 在每个示例中,我将四舍五入到最接近的百万。
Compare “Social security and welfare” ($122 million) with “Health” ($60 million). Does the Health slice look about half the size of the Social security and welfare slice?
比较“社会保障和福利”(1.22亿美元)和“健康”(6000万美元)。 “健康”部分看起来是否只有社会保障和福利部分的一半?
Compare “General government services” ($97 million) with the Social security and welfare slice. General government services is about 4/5 the expenditure of Social security and welfare, but the pie chart makes them look about the same amount.
将“一般政府服务”(9700万美元)与“社会保障和福利”部分进行比较。 一般政府服务大约是社会保障和福利支出的4/5,但是饼形图使它们看起来大致相同。
The ordering of the categories isn’t clear, either. They’re not in size order. They’re not in alphabetical order.
类别的顺序也不明确。 它们不是按大小顺序排列的。 它们不是按字母顺序排列的。
What is the solution? Again, the same as for 2-D pie charts. If there are few categories, a bar chart is a better presentation of the data.
解决办法是什么? 同样,与二维饼图相同。 如果类别很少,则条形图可以更好地显示数据。
Let’s see how the bottom pie chart looks in bar chart form, using R. I’m using the ggplot2package to do the plotting, and the stringrpackage to handle the text wrapping on the axis labels.
让我们使用R来查看底部饼图以条形图形式显示的样子。 我正在使用ggplot2软件包进行绘制,并使用stringer软件包来处理轴标签上的文字换行。
I like the colour sequence and combination of Set3 in the ColorBrewer palette. I’ve also removed clutter from the chart by removing the background colour and extraneous grid lines. I have ordered the expenditure categories by descending amount. I have wrapped the y-axis text to provide a better ratio of y-axis width versus internal plot width. The legend has been suppressed. I’ve expanded the right hand outer margin of the graph so the final x-axis value is not cut-off.
我喜欢ColorBrewer调色板中Set3的颜色顺序和组合。 我还通过删除背景色和多余的网格线从图表中消除了混乱。 我已按降序对支出类别进行了排序。 我包装了y轴文本,以提供更好的y轴宽度与内部绘图宽度之比。 传说已被压制。 我已经扩展了图形的右手外部边缘,因此最终的x轴值不会被截断。
TaxExpenditure <- data.frame(Expenditure.Type=c(factor("Industry & workforce", "Defence", "Social security & welfare", "Community services & culture", "Health", "Infrastructure, transport & energy", "Education", "General government services")), Expenditure.Amount=c(14.843, 21.277, 121.907, 8.044, 59.858, 13.221, 29.870, 96.797))library("ggplot2")library("stringr")ggplot(data=TaxExpenditure,aes(x=reorder(Expenditure.Type, Expenditure.Amount), y=Expenditure.Amount, fill=Expenditure.Type)) + geom_bar(stat="identity") + scale_y_continuous(breaks=seq(0, 125, by=25), limits=c(0,125), expand=c(0,0)) + scale_x_discrete(labels=function(x) str_wrap(x, width=20))+ labs(x="Expenditure type", y="Expenditure ($millions)")+ scale_fill_brewer(palette="Set3") + coord_flip() + theme(panel.grid.minor.y=element_blank(), panel.grid.major.x=element_line(color="gray"), panel.background=element_blank(), axis.line = element_line(color="gray", size = 1), axis.text=element_text(size=10), axis.title=element_text(size=15), plot.margin=margin(5,15,5,5), legend.position = "none")The resulting graph is shown below. The relative differences in expenditure are much easier to see compared to the pie chart.
结果图如下所示。 与饼图相比,支出的相对差异更容易看到。
3D分解饼图 (3-D exploded pie charts)
Friends don’t let friends create 3-D exploded pie charts.
朋友不允许朋友创建3D爆炸饼图。
3-D条形图 (3-D bar charts)
3-D bar charts are notoriously difficult to interpret correctly, as they try to compress three dimensions into 2-D space. The examples below are particularly complicated, due to the positioning of the zero plane.
众所周知,3-D条形图难以正确解释,因为它们试图将三维压缩到二维空间中。 由于零平面的定位,以下示例特别复杂。
有关更好图形的更多建议 (More suggestions for better graphs)
不要使用模式 (Don’t use patterns)
The use of colour/grey-scale in graphs is better than using a pattern. Patterns, such as cross-hatching, make graphs harder to read.
在图形中使用彩色/灰度比使用图案更好。 交叉影线等模式使图形更难以阅读。
Example 1:
范例1:
Example 2:
范例2:
使用合适的配色方案 (Use a suitable colour scheme)
Different color schemes are available for graphs. Not all of them are good.
图形可以使用不同的配色方案。 并非所有人都很好。
使用合适的轴刻度 (Use suitable axis scales)
Your numeric axis should start at zero. If your numbers are very large, express them in a suitable order of magnitude, for example using millions of dollars, or thousands of hours as your base.
您的数字轴应从零开始。 如果您的数字很大,则以适当的数量级表示它们,例如以数百万美元或数千小时为基础。
If your graph then shows little variation between the category values, consider why a graph is necessary.
如果您的图形在类别值之间几乎没有变化,请考虑为什么需要图形。
Did you want to show a change from year to year? If so, you could graph the percentage change from one year to the next, instead of graphing the raw numbers.
您想逐年显示变化吗? 如果是这样,您可以绘制从一年到下一年的百分比变化,而不是绘制原始数字。
Did you want to highlight the impact of a particular factor across time? One option is to graph that factor and nothing else.
您是否想强调特定因素在整个时间范围内的影响? 一种选择是用图形表示该因素,而别无其他。
类别排序很重要 (Category ordering is important)
No one rule fits all for deciding the order of the categories. One option, which I have used in my examples, is by height. How will you decide your ordering:
没有一项规则可以完全适合于确定类别的顺序。 我在示例中使用的一个选项是按高度。 您将如何决定订购:
- highest to lowest?最高到最低?
- alphabetical by category?按类别字母顺序?
- some other order?其他命令?
The order you use depends on the main information that the client needs from the chart.
您使用的顺序取决于客户从图表中需要的主要信息。
仔细检查图形的准确性 (Double check the accuracy of your graphic)
考虑使用误差线 (Consider using error bars)
The graph below comes from a study that examined the effect of THC on subject reaction times and accuracy of response, using a computerised stimulus.
下图来自一项研究,该研究使用计算机刺激检查了四氢大麻酚对受试者React时间和React准确性的影响。
They have included error bars on each measure, so we can see at a glance whether any of the results differed between the subject groups (placebo versus THC). Only a greyscale color scheme has been used, and it is very effective.
它们在每个量度上都包含误差线,因此我们可以一目了然地看出受试者组之间的任何结果是否有所不同(安慰剂与THC)。 仅使用了灰度配色方案,它非常有效。
用于创建更好图形的资源 (Resources for creating better graphs)
The guru for creating better graphs, and graphics, is Edward Tufte. All his books are works of art, but for the presentation of numbers I recommend The Visual Display of Quantitative Information.
创建更好的图形和图形的专家是Edward Tufte 。 他所有的书都是艺术品,但是对于数字表示,我建议您使用定量信息的视觉展示 。
A blog I find particularly useful is FlowingData. Even if you don’t become a (paid) member of the site, Nathan is a prolific publisher and you can get ideas from his posts. Some of these posts are graphics he has made, and others are examples of well-designed graphics he has sourced from elsewhere.
我发现一个特别有用的博客是FlowingData 。 即使您没有成为该站点的(付费)成员,Nathan还是一位多产的出版商,您可以从他的帖子中获得想法。 这些帖子中有些是他制作的图形,有些是他从其他地方采购的精心设计的图形的示例。
Disclaimer: no actual graphs were harmed in the making of this article.
免责声明:本文撰写过程中未损害任何实际图形。
翻译自: https://www.freecodecamp.org/news/how-to-design-terrible-graphs-3b213d909387/
晕水症最可怕的测试图48张