代做CS3DP Distributed and Parallel Computing代做Python语言

Department of Computer Science

Summative Coursework Set Front Page

Module Title:  Distributed and Parallel Computing

Module Code:  CS3DP

Type of Assignment:  Coursework

Individual / Group Assignment:  Individual

Weighting of the Assignment:  50%

Total marks: 100

Page limit/Word count:  4 pages

Expected hours spent for this assignment:  20 hours

Items to be submitted:  4-page report  (excluding title page, diagrams, graphs and references) in pdf format

Work to be submitted on-line via Blackboard Learn by:  Monday 12th  May 12 Noon

Work will be marked and returned by: Tuesday 4th June.

NOTES

By submitting this work, you are certifying that it is all your sentences, figures, tables, equations, code snippets, artworks, and illustrations in this report are original and have not been taken from any other person's work except where explicitly the works of others have  been acknowledged, quoted, and  referenced. You understand that failing to do so will be considered a case of plagiarism. Plagiarism is a form. of academic misconduct and will be penalised accordingly. The University’s Statement of Academic Misconduct is available on the University web pages.

If your work is submitted after the deadline, 10% of the maximum possible mark will be deducted for each working day (or part of) it is late. A mark of zero will be awarded if  your  work  is  submitted   more  than  5  working  days   late.  You   are  strongly recommended to hand work in by the deadline as a late submission on one piece of work can impact on other work.

If you believe that you have a valid reason for failing to meet a deadline then you should make an Exceptional Circumstances request and submit it before the deadline, or as soon as is practicable afterwards, explaining why. To make such a request log on to  RISIS  and  on  the  Actions  tab  select  Exceptional  Circumstance:  as  explained  at

https://www.reading.ac.uk/essentials/The-Important-Stuff/Rules-and- regulations/Exceptional-Circumstances

1. Assessment Classifications

First Class (>= 70%)

Outstanding/excellent work with correct results, a good presentation  of  the  theoretical  concepts  applied  to applications.  A   good  presentation  of  the   code  and results,   and   a   critical   analysis   of   the    results.   An outstanding  work  will  present  full  solutions  with  an insightful discussion.

Upper Second (60-69%)

Very good work with partial correct results: most work has  been  carried  out  correctly.  Some  tasks  have  not been  carried  out  or  are  not  completely  correct.  The presentation   is   good,    well   structured,   clear   and complete with respect to the work done.

Lower Second (50-59%)

Good work, which is missing some significant part of the assignment, and/or with partially correct results. Some tasks have not been carried out. The presentation is, in general,  accurate  and  complete,  but  it  lacks  clarity (presentation quality).

Third (40-49%)

Acceptable solutions to limited part of the assignment. Some tasks have not been carried out. Some results may not be complete or technically sound. The presentation is not accurate, complete and lacks clarity.

Pass (35-39%)

Partial solutions to limited part of the assignment. Some tasks have not been carried out. Some results may not be complete or technically sound. The presentation is not accurate, complete and lacks clarity.

Fail (0-34%)

Incomplete solutions to limited part of the assignment. Most tasks  have  not  been  carried  out  with  sufficient accuracy.  Results  may  not  be  correct  or  technically sound. The presentation is not accurate, complete and lacks clarity.

2. Assignment Description

Task 1. Understanding the CAP Theorem. [25 marks]

Choose a real-life distributed computing system, such as the distributed Domain Name Server (DNS), not one that we’ve introduced in the lectures. Analyse its distributed architecture and the rationale behind its design including aspects such as data storage, algorithms, and investigate how the CAP theorem is applied in the design and operations.

Analyse and discuss the following:

a)   The semantical properties: concurrency, availability and durability

b)   The wish-list for distributed software, consider whether and how the chosen system has the following properties: High-availability, Fault tolerance, scalability, Extensibility, Usability, debuggability and Efficiency.

c)   How is the CAP theorem applied for the chosen distributed system and why?

Please note: your discussion should incorporate specific function/service examples provided by the system to show your understanding of the CAP theorem. A general introduction to the chosen distributed computing system and the semantical properties is not sufficient.

Task 2. Understanding the Hadoop MapReduce Framework [35 marks]

You have been provided with a dataset below containing web server log files from a large e-commerce platform. Each log entry contains:

•    IP Address (e.g., 192.168.1.1)

•   Timestamp (e.g., [10/Jan/2024:10:05:23 +0000])

•    Requested URL (e.g., "GET /product/123 HTTP/1.1")

•    HTTP Status Code (e.g., 200)

•    User Agent (e.g., Mozilla/5.0)

192.168.1.1 - - [10/Jan/2024:10:05:23 +0000] "GET/home HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.2 - - [10/Jan/2024:10:06:12 +0000] "GET/product/123 HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.3 - - [10/Jan/2024:10:07:45 +0000] "GET/cart HTTP/1.1" 302 "Mozilla/5.0"

192.168.1.1 - - [10/Jan/2024:10:08:50 +0000] "GET/home HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.4 - - [10/Jan/2024:10:10:30 +0000] "GET/checkout HTTP/1.1" 500 "Mozilla/5.0"

192.168.1.2 - - [10/Jan/2024:10:15:42 +0000] "GET/product/456 HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.5 - - [10/Jan/2024:10:20:18 +0000] "GET/home HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.6 - - [10/Jan/2024:10:22:30 +0000] "GET/product/789 HTTP/1.1" 404 "Mozilla/5.0"

192.168.1.7 - - [10/Jan/2024:10:25:47 +0000] "GET/cart HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.3 - - [10/Jan/2024:10:30:59 +0000] "GET/home HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.8 - - [10/Jan/2024:10:35:10 +0000] "GET/checkout HTTP/1.1" 500 "Mozilla/5.0"

192.168.1.2 - - [10/Jan/2024:10:40:25 +0000] "GET/product/123 HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.2 - - [10/Jan/2024:10:42:47 +0000] "GET/cart HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.4 - - [10/Jan/2024:10:43:45 +0000] "GET /product/357 HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.9 - - [10/Jan/2024:10:45:33 +0000] "GET/product/123 HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.2 - - [10/Jan/2024:10:46:30 +0000] "GET/checkout HTTP/1.1" 500 "Mozilla/5.0"

192.168.1.5 - - [10/Jan/2024:10:46:40 +0000] "GET /product/235  HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.5 - - [10/Jan/2024:10:50:45 +0000] "GET /product/357 HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.9 - - [10/Jan/2024:10:51:40 +0000] "GET /product/246  HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.5 - - [10/Jan/2024:10:52:47 +0000] "GET/cart HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.5 - - [10/Jan/2024:10:52:52 +0000] "GET /product/579 HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.9 - - [10/Jan/2024:10:53:50 +0000] "GET /product/135 HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.5 - - [10/Jan/2024:10:55:50 +0000] "GET/checkout HTTP/1.1" 500 "Mozilla/5.0"

192.168.1.10 - - [10/Jan/2024:11:00:12 +0000] "GET/contact HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.6 - - [10/Jan/2024:11:05:29 +0000] "GET/product/456 HTTP/1.1" 404 "Mozilla/5.0"

192.168.1.2 - - [10/Jan/2024:11:15:00 +0000] "GET/product/123 HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.5 - - [10/Jan/2024:11:35:10 +0000] "GET/checkout HTTP/1.1" 500 "Mozilla/5.0"

192.168.1.8 - - [10/Jan/2024:11:40:42 +0000] "GET/product/789 HTTP/1.1" 404 "Mozilla/5.0"

192.168.1.9 - - [10/Jan/2024:11:50:30 +0000] "GET/home HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.10 - - [10/Jan/2024:12:00:05 +0000] "GET/checkout HTTP/1.1" 500 "Mozilla/5.0"

192.168.1.11 - - [10/Jan/2024:12:05:15 +0000] "GET/product/123 HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.12 - - [10/Jan/2024:12:10:25 +0000] "GET/home HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.13 - - [10/Jan/2024:12:15:40 +0000] "GET/cart HTTP/1.1" 302 "Mozilla/5.0"

192.168.1.14 - - [10/Jan/2024:12:20:00 +0000] "GET/checkout HTTP/1.1" 500 "Mozilla/5.0"

192.168.1.15 - - [10/Jan/2024:12:25:50 +0000] "GET/product/456 HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.20 - - [10/Jan/2024:12:50:10 +0000] "GET/cart HTTP/1.1" 200 "Mozilla/5.0"

192.168.1.21 - - [10/Jan/2024:12:55:55 +0000] "GET/checkout HTTP/1.1" 500 "Mozilla/5.0"

Your task is to use MapReduce to analyse the logs and extract the information - the Top 10 most frequently accessed URLs within the hour peak hour (the hour with the most requests), and the number of unique visitors to each of the Top 10 URLs within the peak traffic hour. You need to break down this complicated task into multiple MapReduce jobs and justify how the jobs can lead to the desired result in the end.

In your report, you are required to:

a)   Describe the decomposition of the task into a series of MapReduce jobs and justify the  reasoning.  For each MapReduce job,  identify the input, output, Mapper function, and Reducer function.

b)   Write python programs for the map and reduce functions and run the MapReduce jobs in the Hadoop local setup. The code should be included in an appendix to the report and is excluded from the 4-page length limit.

c)   In the main report, you need to provide screenshots for the generated result of each MapReduce job. Please note that the results should be the execution results of Hadoop, not the local testing results for your Python programs.

Task 3. Understanding Parallel Programming with Spark [35 marks]

Pi is an essential number in mathematics. It is both transcendental, which means it cannot be determined as a solution to an algebraic equation (polynomial with integer coefficients) and irrational, which means it cannot be expressed as a ratio 

of integers. Determining its value is not so simple. Here is a method which uses random numbers.

Figure 1: Unit circle with radius r   =  1 and a square with side of length one.

Figure 1 is a unit circle, with radius r   =   1 and a square with unit length side.

The area of the square is As   =  r   ×  r   =   1              (1)

The area of the circle is    Ac    =  πr2    =  π                   (2)

The area of the upper quadrant contained in the black square is a quarter of the circle. Darts are thrown at random into the black square and counted to see how many fall within the quadrant.  The  number of darts inside the quadrant  is proportional to the area, and so π can be estimated by taking the ratio of darts that fall inside and outside.

Here is the function in python using random numbers for x and y coordinates to compute whether a dart landing inside or outside of a unit circle.

import random # function to compute whether a "dart" is inside the unit circle def sample(p): x, y = random.random(), random.random() return 1 if x*x + y*y < 1 else 0

Based on the key computation given above, please produce a python program to calculate Pi using pyspark. The program should be able to accept the number of samples (darts) and the number of partitions as two input arguments, and output the value of Pi given the input values.

from pyspark import SparkContext import sys #Define the sample() function here # You should start from creating a Spark Context sc=SparkContext(appName="miPi") # Take the number of samples NUM_SAMPLES=int(sys.argv[1]) # Next you should parallelise the samples in Spark. # Remember to take the number of partitions from an input argument # Your code of using spark should go here. #The number of darts inside the quadrant should be stored in an integer variable count # Calculating Pi, python3 syntax print("Pi:"+str(NUM_SAMPLES)+":"+str(4.0*count/NUM_SAMPLES))

You should also investigate how the accuracy of the calculation scales with the number of darts and the number of parallel elements.

In the report, you are required to include the following components:

a)   Screenshot of your python code (as an appendix, excluded from the length limit)

b)   In the main report, two plots respectively showing how the accuracy varies versus

a.   the number of darts at a fixed partition number;

b.   the number of partitions at a fixed number of samples.

c)   Brief discussion and justification on the observations. Explore the reasons behind unexpected observations.




热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图