代做DS-UA 202, Responsible Data Science, Spring 2024代写C/C++程序

DS-UA 202, Responsible Data Science, Spring 2024

Course Project: Technical Audit of an Automated Decision System

assigned on February 20, 2025; see description for due dates

Objectives

In this project, you will work in teams of two to conduct a technical audit of an automated decision system (ADS) of your choice.  We suggest that you audit one of the systems developed in response to a Kaggle competition of your choice, but you should feel free to use other systems that are of interest to you. Do not focus on Northpointe’s COMPAS in this assignment, since this tool was already covered extensively during class.  Be sure to prominently cite your sources of code and data!

Both team members should work together on all parts of the project.  You should not discuss your project submission or components of your solution with any students other than your project partner.  If you have questions about this assignment, please send a private question to all instructors over email.

Detailed description and goals

In this project, we encourage you to focus on examples from Kaggle competitions, where the goals, the data, and one or several implementations are available for analysis.   Select a Kaggle competition that has already finished, and for which you can find and successfully execute at least one solution.  A list of solutions to Kaggle competitions is available here, and you may be  able to find solutions in other ways.  If you decide to work with a system that’s not from Kaggle, you should make sure that data and at least one implementation is available to you.  Once again: Be sure to prominently cite your sources of code and data!

Background reading

Your report, and the corresponding Google Colab notebook(s), should be the result of our audit. We do not expect you to develop a UI or any other fancy data presentation methods.  That said, it is important that the plots you produce are informative, and that they support your analysis.

This reading list should inspire you to think about interesting ways to analyze your ADS.

●   “Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing”, Raji et al., ACM FAccT 2020,

https://dl.acm.org/doi/10.1145/3351095.3372873

●   “Towards Algorithm Auditing: A Survey on Managing Legal, Ethical and Technological Risks of AI, ML and Associated Algorithms”, Koshiyama et al., 2021 

https://www.ssrn.com/abstract=3778998

●    “Problematic Machine Behavior. A Systematic Literature Review of Algorithm Audits”, Bandy, ACM CHI / CSCW 2021 https://dl.acm.org/doi/10.1145/3449148

●   “The algorithm audit: Scoring the algorithms that score us”, Brown et al., Big Data & Society 2021, https://journals.sagepub.com/doi/10.1177/2053951720983865

●   “Resume Format, Linked In URLs and Other Unexpected Influences of AI Personality Prediction in Hiring: Results of an Audit”, Rhea et al., AIES 2022

https://dl.acm.org/doi/10.1145/3514094.3534189

●   “Nutritional labels for data and models”, Stoyanovich and Howe, IEEE Data Engineering Bulletin Special Issue on Fairness, Diversity, and Transparency in Data Systems 42(3),  2019, http://sites.computer.org/debull/A19sept/p13.pdf

●   “The imperative of interpretable machines”, Stoyanovich, Van Bavel, West, Nature Machine Intelligence 2, 2020, https://rdcu.be/b57mr

●   “The dataset nutrition label: A Framework to drive higher data quality standards”, Holland et al., arXiv 2018, https://arxiv.org/abs/1805.03677

   “Datasheets for datasets”, Gebru et al., Communications of the ACM, 2021

https://cacm.acm.org/magazines/2021/12/256932-datasheets-for-datasets/fulltext

   “Model cards for model reporting”, Mitchell et al., ACM FAT* 2019

https://dl.acm.org/doi/10.1145/3287560.3287596

Deliverables and grading

The project is worth 30% of the course grade.  Both partners will receive the same grade for the project.  There are three deliverables, see below for description and due dates.  You may not use any late days towards the course project deliverables.

1.   Project team formation, due at 11:59pm ET on Monday, March 3. Find a project

partner and fill out this form. We will assign a teaching assistant to shepherd your team, they will be your primary contact for any project-related questions.

If you have not identified a team partner by the deadline above, let us know by filling out this form by 11:59pm ET on Monday, March 3, so we can pair you up promptly.

2.   Project proposal, due at 11:59pm ET on Friday, March 28.  Submit a 1-page

summary of your proposed project, listing the names of both project partners and the ADS you propose to analyze in the project. Be explicit about where you’ll get the data  and the code implementing the ADS: cite all sources properly in your project proposal. Leading up to the submission of your project proposal, you should make sure that the  data is available, and that you are able to run the code on that data.

As part of your project proposal, include a brief (1-3 sentence) explanation of why you selected this specific ADS, in relation to the topics we study in this course.  We are still  early in the course, but we encourage you to look at the schedule / syllabus for a full list of topics when answering this question.

3.   Draft report, with Colab notebook, due at 11:59pm ET on Friday, April 18.  Refer to the reading list above, and to the report structure. Submit a draft of your project report, filling in the “Background” and “ Input and Output” sections.  Also develop a detailed plan for the other sections, and describe this plan in your draft.  Submit a PDF of your draft,    and a Colab notebook used for the computation.

4.   Final submission, due at 11:59pm ET on Friday, May 9.  Submit your project report, implementation.  You will be graded on your execution of the project (with a Colab notebook), and on the quality of the project report.  You should submit a Colab notebook implementing your project, an accompanying written report in PDF format (up to 10 pages).

Submission instructions

Both students should submit all project deliverables on Bright Space.

   Submissions from the team partners should be identical with the exception of the brief

project contributions document that each partner should submit individually as part of   the final submission (due on May 9).  In that document, each partner should discuss their own and their partner’s contributions to the project.

●   Your project proposal, draft report, and final report should be submitted as PDF files, created using LaTeX (we suggest using Overleaf).  You should submit a Google Colab notebook, or a collection of notebooks, that support the computation in your report.

Report structure

The outline below may be refined in response to clarification questions.  We will post announcements on Bright Space if and when changes are made.

You may use any of the methods we discussed in class, as well as additional methods you find in the literature, for your analysis.

1.   Background: general information about your chosen ADS

a.  What is the purpose of this ADS?  What are its stated goals?

b.   If the ADS has multiple goals, explain any trade-offs that these goals may introduce.

2.   Input and output

a.   Describe the data used by this ADS. How was this data collected or selected?

b.   For each input feature, describe its datatype, give information on missing values and on the value distribution.  Show pairwise correlations between features if appropriate.  Run any other reasonable profiling of the input that you find interesting and appropriate.

c.   What is the output of the system (e.g., is it a class label, a score, a probability, or some other type of output), and how do we interpret it?

3.   Implementation and validation: present your understanding of the code that implements the ADS.  This code was implemented by others (e.g., as part of the Kaggle competition), not by you as part of this assignment.  Your goal here is to demonstrate that you understand the implementation at a high level.

a.   Describe data cleaning and any other pre-processing

b.   Give high-level information about the implementation of the system

c.   How was the ADS validated? How do we know that it meets its stated goal(s)?

4.   Outcomes

a.  Analyze the accuracy of the ADS by comparing its performance across different subpopulations, with respect to different accuracy metrics.  Carefully justify your choice of accuracy metrics.

b.  Analyze the fairness of the ADS, with respect to different fairness metrics. Carefully justify your choice of fairness metrics.

c.   Develop additional methods to analyze ADS performance: think about stability, robustness, performance on difficult or otherwise important examples, or any other property that you believe is important to check for this ADS.  Carefully justify your methodology.

5.   Summary: reflect on the following points in your report.

a.   Do you believe that the data was appropriate for this ADS?

b.   Do you believe the implementation is robust, accurate, and fair?  Discuss your choice of accuracy and fairness measures, and explain which stakeholders may find these measures appropriate.

c.   Would you be comfortable deploying this ADS in the public sector, or in the industry?  Why so or why not?

d.  What improvements do you recommend to the data collection, processing, or analysis methodology?





热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图