STAT3011 Graphical Data Analysis Assignment 2
The Project:
Project is in two parts:
Part A Presentation Graphics - 20% of your total grade. No page limit but suggest 5 single-sided pages.
Task:
Collect five statistical graphics from published sources throughout the semester. Provide written critique on each.
Minimum Standard: Graphics must be sourced externally from published materials. Credit will be awarded for carefully selected, insightful graphics that reflect a strong understanding of effective data communication and the principles behind conveying information meaningfully.
For each graphic, include:
- A copy of the graphic.
- Full citation details (article title, authors, source, page numbers, etc.).
- A concise discussion on the purpose, strengths, weaknesses, and potential improvements of the graphic. Redrawing improved versions is encouraged but optional.
Commentary Style.: Brief, relevant, and insightful—avoid unnecessary length.
Part B Analysis Graphics & Data Analysis: 40% of your total grade. 8 single-sided page limit.
Task: Select one data set from two options that are provided. Analyse the data set, and prepare a concise, well-organised and insightful report.
Report Content:
- Begin with a clear problem statement and purpose of your analysis.
- Explain your methodology, detailing the rationale behind each approach chosen to solve the problem and achieve the project’s objectives.
- Incorporate relevant graphics that directly support and illustrate each key insight, providing clear interpretation and analysis within the report.
Focus: The analysis should primarily (though not exclusively) be graphical.
Key Points to Consider:
- Clarity and Insight: Ensure your report and commentary are clear, focused, and demonstrate critical thinking. Overly verbose or unfocused submissions may be penalised.
- Depth of Analysis: Highlight meaningful patterns, trends, and insights rather than just describing the visuals. Your work should reflect careful consideration and thoughtful interpretation.
- Length Limit: Part B must not exceed 8 single-sided pages (not including the declaration page, or R code in the appendix).
- Submissions that exceed this limit will only have the first 8 single-sided pages assessed. Attempts to bypass the page limit with small fonts or unreadable formatting will not be accepted.
- Appendices: Include the R code used to generate graphics in Part B of the project as an appendix. Ensure the code is well-organised, commented, and clearly corresponds to the visuals presented in the report.
Submission Details: Via Turitin
Additional comments:
There is no "right answer" to any question you formulate but there are
certainly clever problems to solve and clever insights to derive, and mundane ones.
I have not analysed these data sets so can only be of limited assistance to you. Of course, I will do my best to help you.
ALSO, YOU MAY NOT DISCUSS THE PROJECT WITH OTHER MEMBERS OF THE CLASS OR ANYBODY ELSE!! This requirement is serious, and evidence of plagiarism will result in you FAILING the course.
PLEASE, play by the rules.
The written part of the report should be no longer than 4-5 pages in length.
Begin with a clear statement of which data set you are analysing and
what problem you are trying solve in your analysis.
Describe the various steps in your analysis. Explore why you have done what you have done, and communicate your most insightful observations at each step and summarise your most insightful lessons.
Finally, there should be a brief conclusion in which you
summarise what you have found from all the insights you have gained through your analysis.
You should restrict the number of graphics you present though of course, you may describe graphics you have constructed but not actually selected for presentation.
If you choose Option 2 (insurance data set) below, the 8 pages includes any maps you might display, so 8 pages is a hard limit. Pages beyond the 8th will not be read.
Marks are available for flair, creativity and those difficult to define
aspects of the project.
Choose ONE data set below for Part B
The project data is in the following R objects in the class data file:
Option 1: otter;
Option 2: insure.
Option 1: Social Grooming in North American River Otters
As part of a large study on the social behaviour of Lutra canadensis,
data on the grooming behaviour of five groups of captive otters was
obtained. It is generally believed that grooming is the social cement
of animal groups and plays an important role in bonding.
The questions of interest include:
1) Do animals within a group groom equally or are some groomed more
than they groom others?
2) In multi-member groups (A and H) do individuals exhibit preferences
in who they groom?
3) Do females groom males more than males groom females?
4) Do grooming rates change in the breeding season?
The data provided identifies the group, whether the season is breeding
(B) or not (N), the time in minutes of observation, the animals involved
and the frequency of grooming. The groups are
A: F1 (adult female) M2, M3, M4 (adult males)
B: F7 (adult female) M8 (adult male)
C: F9 (adult female) M15 (adult male)
D: F5 (adult female) M6 (adult male) siblings
H: F21 (subadult female) F22(young adult female)
M23 (subadult male) M24 (young adult male)
The data is in a list called otter. Each component is a vector of
length 394. $group is the group, $season is the season, $time is the
time observed in minutes (it is the length of time the groups are watched, NOT the length of time they spend grooming), $groomer is the groomer, $groomee is the groomee and $frequency is the frequency of grooming (number of grooms observed).
Project Option 2: Insurance availability in Chicago
The U.S. Commission on Civil Rights collected data in an attempt to
examine charges that insurance companies were "redlining" certain
neighbourhoods. i.e. cancelling and/or refusing to renew policies.
The data provided include the number of cancellations, nonrenewals, new
policies and renewals of home and fire policies for each neighbourhood
by zip code for the months December 1977 - February 1978. This
information is combined into a single variable denoted Voluntary market
activity which is the number of new policies and renewals minus the
number of cancellations and nonrenewals expressed per 100 housing
units. In addition, information on the number of FAIR plan policies
was obtained. These policies are obtained after applicants have been
rejected for other policies so this information also reflects the
availability of policies. This information is provided as the
involuntary market activity, the number of FAIR plan policies and
renewals per 100 housing units. In addition, the Chicago Police
provided theft data and the Fire Department provided fire data from
1975 for each neighbourhood. These data are the number of incidents
per 1000 housing units in 1975. (The insurance companies claim to use
a three year lag on crime data when they set their premiums.) Finally,
the Census Bureau provide data on the racial composition (in per cent
minority), income and the age of housing units. The income is the
median family income and the age is coded as the percentage of units
built in or before 1939.
The objectives of the study are to explore the extent to which racial
composition and age of housing affect underwriting practices after
controlling for factors like fire and theft.
So do neighbourhood attributes such as racial composition and age of housing explain variation in insurance policies? Therefore, does the data suggest that insurance companies may be engaging in redlining?
The data is provided in a 47x8 data matrix called insure. A map of the
neighbourhoods with their zip codes is available as a pdf file.