Anyone can look at a table, but it takes more work to successfully analyze and draw conclusions from that table. Halfway through each year, students at Portland High School take a midterm exam in each of their classes, which is a summation of all the learning we have done in the first months of school. For our midterm in Statistics class, we had a test over everything that we had learned throughout the first semester. This test was important to me because my second quarter grade was lower than I wanted it to be, and I knew a good midterm grade could bring my overall semester grade closer to where I would like it to be. I studied for the midterm by looking over past quizzes and homeworks, as well as doing the review packet that was given out a week before the exam. Throughout the second quarter, I was familiar with all of the content for each quiz, but I consistently made careless mistakes that resulted in me getting lower grades. On the test, I quickly went through and did everything that I could accomplish easily, and then waited to receive my binder to finish topics I was more unsure about.
At the end of the test I reviewed each page to make sure I had done everything as well as I could. I ended up getting a 100 on the midterm, which I was, of course, very happy with. Following the exam, we received a list of each of our classmates’ scores from to look over, analyze and compare using a variety of graphs and tables. These analyses and comparisons will be done in the paper below, as I will be comparing classmates’ scores with each other, and comparing my own score to those of my classmates. The graphs we created helped us to organize and explain the data in an understandable way. Each of the sixty scores are listed in the table below.
The data is a population because it analyzes all of the scores from the test, rather than just a select sample. The data is also quantitative because the numbers have actual value (higher numbers indicate a better score than lower numbers) and are not just random descriptors. We began by constructing a frequency distribution to consolidate all of the data into one location so it is clearer to analyze and comprehend. We made the distribution with six classes, and each class had a width of 11, which is how many numbers are in each set of data. In the next column of the distribution is the midpoints.
The midpoints are the center number of each class, and they are a data points used in the frequency histogram. Listed next is the frequency, or how many data points appear in the given class. The sum of the frequencies should add up to the total number of data points (60). To find the frequency we simply counted how many points appeared within a class’s range. In the next column is relative frequency, or how common numbers in a given class are relative to the whole. To find relative frequency we divided a class’s frequency by 60, or the number of data points we had. Relative frequency in pie charts, like the one below the frequency distribution, and can be converted to percentages for other graphs and data displays. The relative frequencies should add up to one, or very close to one, and can be displayed in a relative frequency polygon.
The final column is the cumulative frequency, which adds together the frequencies of each class as we move down the graph. Cumulative frequency can be displayed in an ogive. Together the data in this table tells us most of what we need to know to be able to fully analyze our scores. My score was a 100, which falls into the sixth class of 92-102. There are 11 other scores that fall into this class, so it has a relative frequency of .1833, or 18.33%. This is shown in the graph below, in which my class is in red. We can see from the graph that scores in my class are not entirely uncommon, though they are not the most common group of scores. In this histogram, the x-axis represents the midpoints of each class, from 42 to 97, and the y-axis represents the frequencies of each class, ranging from 3 to 14.
Next, we found the three measures of central tendency; mean, median, and mode. The mean, which is the mathematical average, is 74.0333. The median, or the data point in the center of the set of data numerically, is slightly higher at 77, and the data is multimodal, with modes of 63, 77, and 85. Each of those scores were earned by three students. My score was above both the mean and median, and was not one of the three modes. The best measure of central tendency for this data set is the mean because the data is continuous and rather symmetrical. The data’s standard deviation is 16.4276. This information tells us the average to which a data group’s scores are spread out from the center.
Next, we constructed a five number summary from the data set. As is shown below, a five number summary includes the minimum, 25th percentile or quartile one, median or quartile two, 75th percentile or quartile three, and the maximum number in the data. We can also use this feature, along with the IQR, to see if there are any outliers. The IQR is the interquartile range, which is the difference in value between the Q3 and Q1 numbers. The IQR is not the same as the range, which is the difference between the maximum and the minimum numbers. Next we looked for outliers, which are numbers that do not represent the data well because they are so far above or below the rest of the data. We found that the lower and upper outlier cutoffs are 26.5 and 122.5 respectively, meaning that no number in our data set would be considered an outlier. Though my score was equal to the maximum, it was not considered an outlier the five number summary, we constructed a box and whisker plot, which helps us to understand the spread of the data in a more visual display. This plot displays, on a number line, a line that travels from the minimum to the maximum. It also has a box that goes from the 25th percentile to the 75th, or quartile one to quartile three. The median is marked by a line across the box.
My score is the maximum, and is marked with at the right end of the box and whisker plot. It is in quartile 4, which has scores ranging from 87 to 100. Because my score is the maximum it is in the 100th percentile. The graph is nearly uniform in shape, but it is slightly skewed to the left. Because the outlier cutoffs are 26.5 and 122.5, there are neither upper nor lower outliers. However, some of the lowest scores are closer to the outlier cutoff than the highest scores. The center, or, in the case of the box and whisker plot, the median, is 77, and the mean is almost three points lower. Lastly, the spread is marked with the measure of two data points; IQR and range. The IQR is 24, ranging from 62.5 to 86.5, and the range is 63, ranging from 37 to 100. In conclusion, compiling the data from our midterm exams into the various graphs and tables shown in this paper helped us to understand how our scores compared to those of our classmates, and how to go about fully analyzing a data set.
We began by constructing a frequency distribution, and used that information to make both a frequency histogram and a relative frequency pie chart. We then used the information to make a five number summary, which provided the framework for the final display of data, the box and whisker plot. Using the data we gathered, we have access to a firsthand example of how statistics can be applied in the real world. If we had only taken the test without analyzing the data, the scores each student earned would have no meaning other than merely serving as a grade in the gradebook. Comparing our data at this level of detail provides a solid exemplar of how to successfully and succinctly analyze any set of data.