Welcome to STA 221!
Welcome!
Meet Prof. Tackett!
- Education and career journey
- BS in Math and MS in Statistics from University of Tennessee
- Statistician at Capital One
- PhD in Statistics from University of Virginia
- Assistant Professor of the Practice, Department of Statistical Science at Duke
- Work focuses on statistics education and sense of belonging in introductory math and statistics classes
- Co-leader of the Bass Connections team Mental Health and the Justice System in Durham County
- Mom of 2-year-old twins 🙂
Meet the Teaching Assistants (TAs)!
Kat Husar (PhD): Head TA + Lab 01 leader
Kelly Huang (UG): Classroom TA
Janice Kim (MS): Classroom TA
Cathy Lee (PhD): Lab 02 leader
Alan Wang (UG): Lab 01 helper
Check-in on Ed Discussion!
Click on the link or scan the QR code to answer the Ed Discussion poll
https://edstem.org/us/courses/70811/discussion/5950645

Topics
Introduction to the course
Syllabus activity
Reproducibility
Regression Analysis
What is regression analysis?
Regression analysis is a statistical method used to examine the relationship between a response variable and one or more predictor variables. It is used for predicting future values, understanding relationships between variables, and identifying key predictors. It also helps in modeling trends, assessing the impact of changes, and detecting outliers in data.
Source: ChatGPT (with modification)
Regression in practice
Regression in practice
Example: Rent vs. commute time
. . .
STA 221
What is STA 221?
Prerequisites: Introductory statistics or probability course and linear algebra
Recommended corequisite: Probability course at Duke
Course learning objectives
By the end of the semester, you will be able to…
- analyze data to explore real-world multivariable relationships.
- fit, interpret, and draw conclusions from linear and logistic regression models.
- implement a reproducible analysis workflow using R for analysis, Quarto to write reports and GitHub for version control and collaboration.
- explain the mathematical foundations of linear and logistic regression.
- effectively communicate statistical results to a general audience.
- assess the ethical considerations and implications of analysis decisions.
Course topics
Course overview
Course toolkit
- Website: https://sta221-sp25.netlify.app
- Central hub for the course!
- Tour of the website
- Canvas: https://canvas.duke.edu/courses/51767
- Gradebook
- Office hours
- Announcements
- Gradescope
- Ed Discussion
- GitHub: https://github.com/sta221-sp25
- Distribute assignments
- Platform for version control and collaboration
Computing toolkit
All analyses using R, a statistical programming language
Write reproducible reports in Quarto
Access RStudio through STA 221 Docker Containers
Access assignments
Facilitates version control and collaboration
All work in STA 221 course organization
Classroom community
It is my intent that students from all diverse backgrounds and perspectives be well-served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that the students bring to this class be viewed as a resource, strength and benefit.
If you have a name that differs from those that appear in your official Duke records, please let me know.
Please let me know your preferred pronouns, if you are comfortable sharing.
If you feel like your performance in the class is being impacted by your experiences outside of class, please don’t hesitate to come and talk with me. If you prefer to speak with someone outside of the course, your advisers and deans are excellent resources.
I (like many people) am still in the process of learning about diverse perspectives and identities. If something was said or done in class (by anyone) that made you feel uncomfortable, please talk to me about it.
Accessibility
The Student Disability Access Office (SDAO) is available to ensure that students are able to engage with their courses and related assignments.
If you have documented accommodations from SDAO, please send the documentation as soon as possible.
I am committed to making all course activities and materials accessible. If any course component is not accessible to you in any way, please don’t hesitate to let me know.
Syllabus activity
- Read the portion of the syllabus assigned to your group.
- Discuss the key points and questions you my have with your neighbors.
- We’ll ask for volunteers to share a summary with the class.
Syllabus activity assignments
Group 1: What to expect in lectures and labs
Group 2: Homework and lab assignments
Group 3: Exams and project
Group 4: Participation
Group 5: Academic honesty (except AI policy)
Group 6: Artificial intelligence policy
Group 7: Late work policy and waiver for extenuating circumstances
Group 9: Getting help in the course
Syllabus activity report out
Group 1: What to expect in lectures and labs
Group 2: Homework and lab assignments
Group 3: Exams and project
Group 4: Participation
Group 5: Academic honesty (except AI policy)
Group 6: Artificial intelligence policy
Group 7: Late work policy and waiver for extenuating circumstances
Group 9: Getting help in the course
Grading
Category | Percentage |
---|---|
Homework | 30% |
Final project | 15% |
Lab | 10% |
Exams (2 midterms) | 40% |
Participation (AEs + Teamwork) | 5% |
Total | 100% |
Five tips for success in STA 221
Complete all the preparation work before class.
Ask questions in class, office hours, and on Ed Discussion.
Do the homework and labs; get started on homework early when possible.
Don’t procrastinate and don’t let a week pass by with lingering questions.
Stay up-to-date on announcements on Ed Discussion and sent via email.
Questions?
Reproducible workflow
Reproducibility checklist
What does it mean for an analysis to be reproducible?
. . .
Near term goals:
✔️ Can the tables and figures be exactly reproduced from the code and data?
✔️ Does the code actually do what you think it does?
✔️ In addition to what was done, is it clear why it was done?
. . .
Long term goals:
✔️ Can the code be used for other data?
✔️ Can you extend the code to do other things?
Why is reproducibility important?
Results produced are more reliable and trustworthy (Ostblom and Timbers 2022)
Facilitates more effective collaboration (Ostblom and Timbers 2022)
Contributing to science, which builds and organizes knowledge in terms of testable hypotheses (Alexander 2023)
Possible to identify and correct errors or biases in the analysis process (Alexander 2023)
Why is reproducibility important?
Originally reported “the intervention, compared with usual care, resulted in a fewer number of mean COPD-related hospitalizations and emergency department visits at 6 months per participant.”
There were actually more COPD-related hospitalizations and emergency department visits in the intervention group compared to the control group
Mixed up the intervention vs. control group using “0/1” coding
Toolkit
Scriptability
RLiterate programming (code, narrative, output in one place)
QuartoVersion control
Git / GitHub
R and RStudio
R is a statistical programming language
RStudio is a convenient interface for R (an integrated development environment, IDE)
RStudio IDE
Quarto
Fully reproducible reports – the analysis is run from the beginning each time you render
Code goes in chunks and narrative goes outside of chunks
Visual editor to make document editing experience similar to a word processor (Google docs, Word, Pages, etc.)
Quarto
How will we use Quarto?
Every application exercise and assignment is written in a Quarto document
You’ll have a template Quarto document to start with
The amount of scaffolding in the template will decrease over the semester
Version control with git and GitHub
What is versioning?
What is versioning?
with human readable messages
Why do we need version control?
Provides a clear record of how the analysis methods evolved. This makes analysis auditable and thus more trustworthy and reliable. (Ostblom and Timbers 2022)
git and GitHub
- git is a version control system – like “Track Changes” features from Microsoft Word.
- GitHub is the home for your git-based projects on the internet (like DropBox but much better).
- There are a lot of git commands and very few people know them all. 99% of the time you will use git to add, commit, push, and pull.
Before next class
Review syllabus
Office hours start Monday, January 13
- Alan’s office hours start January 27