代做BAN 440 Lab 1 - Data Preparation in RapidMiner代写数据结构语言

BAN 440 Lab 1 - Data Preparation in RapidMiner (30 points)

Adapted from “Data Mining for the Masses” Chapter 3

Please follow the instructions carefully to finish lab assignment 1. In this assignment, you will be asked to make 8 screenshots and paste them to the “BAN440 Lab1 Submission YourLastName.docx” file. Once you are done with all 8 screenshots, please submit the word file “BAN440 Lab1 Submission YourLastName.docx” (with your own last name in the file name) to Canvas via the submission link.  

Note: YourLastName” in this document refers to your own last name. Don’t literally type in “YourLastName.”

A. CREATE YOUR OWN REPOSITORY

1) Launch the RapidMiner application. This can be done by double clicking on your desktop icon named as “RapidMiner Studio” (as shown below), or by finding it in your application menu.

 

Within RapidMiner there are two main areas that hold useful tools: Repositories and Operators. The Repositories area is the place where you will connect to each data set. The Operators area is where all data mining tools are located. These are used to build models and otherwise manipulate data sets.

2) Follow the screenshot below to create your own new repository for BAN 440 class.

 

3) Click Next

  

Please change the Alias name to BAN440_YourLastName, and find a local folder where you want to put all the files related to this class. Then click Finish. (Important Note: you MUST name it as BAN440_YourLastName to get credit for this step.)

Hint: There is no specific requirement on where to save your repository for this lab. You may put it in your own FCB shared drive where you can access from any FCB computer. The repository you created here is computer specific, which means you may have to recreate it if you switch to a different computer.

  

4) You will see a newly created repository named as BAN440_YourLastName. 

  

B. DOWNLOAD AND IMPORT DATA

1) Please download the data file “Chapter03DataSet.csv” from Canvas and save it to your local drive. 

2) You can use Excel to view the downloaded file. This data set is very small, comprised of only 15 attributes and 11 observations. Our next step is to connect to this data set. When you browse this data set, you will notice there are some missing data as indicated by the green arrows (see below).

Missing data are data that do not exist in a data set. As you can see in the screenshot, missing data is not the same as zero or some other value. It is blank, and the value is unknown. Missing data are also sometimes known in the database world as null. Depending on your objective in data mining, you may choose to leave missing data as they are, or you may wish to replace missing data with some other value. We will deal with the missing data in later steps of this assignment.

 

At this point, we could do a number of complicated and technical things, such as connecting to a remote enterprise database. This, however, would likely be overwhelming for now. For the purposes of this lab assignment, we will only need to connect to comma separate values (CSV) files. Please be aware that in the real world, most data mining projects incorporate extremely large data sets, encompassing dozens of attributes and thousands or even millions of observations. We will use smaller data sets in this assignment, but the foundational concepts illustrated here are the same as for larger ones.

3) Click on the “Import Data” icon, as indicated in the red rectangle box on the picture below. Then click on “My Computer.” Note that by importing, you are bringing your data into a RapidMiner file, rather than working with data that are already stored elsewhere. If your data set is extremely large, it may take some time to import, and you should be mindful of disk space that is available to you.

  

4) Locate the file (Chapter03DataSet.csv), and then click on Next.

 

5) The column separation delimiter is Comma. Keep the default settings as shown in the screenshot below and click on Next.

 

6) RapidMiner will take its best guess at a data type for each attribute. The data type is the kind of data an attribute holds, such as polynominal, integer, or text.

 

7) Date types can be changed by following the screenshot below. Please change Gender from “polynominal” to “binominal.” RapidMiner also indicates a Role for each attribute to play. By default, all columns are imported simply with the role of ‘attribute’, however we can change these by clicking on “Change Role” if we know a particular attribute is going to play a specific role in a data mining model that we will create. Since roles can be set within RapidMiner’s main process window when building data mining models, we will just accept the default ‘attribute’ whenever we import data sets for our class. Also, you may note that “Exclude Column” allows you to not import some of the attributes if you don’t want to. Again, attributes can be excluded from models later in needed, so for this class, we will always include all attributes when importing data. Click on Next.

 

8) The final step for importing is to choose a repository to store the data set in, and to give the data set a name within RapidMiner. As shown in the following screenshot, please store the data set in the repository you just created, which is BAN440_YourLastName, and name it as Chapter03DataSet_YourLastName. Then click Finish. (Important Note: You MUST name it as Chapter03DataSet_YourLastName to get credit for this step).

 

9) Once you click on Finish, this data set will become available to you for any type of data mining process you would like to build upon it. The following screen shows you the Results Perspective.

 

C. RETRIEVE DATA OPERATOR

1) To continue, please click on “Design” tab on the top to switch back to Design Perspective

2) The following screenshot shows the Design view. We can see that the data set “Chapter03DataSet_YourLastName” is now available for use in RapidMiner.

 

3) To begin using it in a RapidMiner data mining process, simply drag the data set and drop it onto the Main Process window. 

4) Each rectangle in a process in RapidMiner is called an operator. The Retrieve operator simply gets a data set and makes it available for use. The small half-circles on the sides of the operator, and of the Main Process window, are called ports. In the following screenshot, an output (out) port from our data set’s Retrieve operator is connected to a result set (res) port via a spline. To draw the spline, please put your mouse cursor to the out port and then move your mouse while holding it, to connect to the res port (on the very left side of the Process window).

The splines, combined with the operators connected by them, constitute a data mining stream. To run a data mining stream and see the results, click on the blue, triangular Play button on the toolbar at the top of the RapidMiner window.

 

5) This will change your view from Design Perspective, which is the above screenshot where you can change your data mining stream, to Results Perspective, which shows your stream’s results, as pictured in the following screenshot.

 

6) When you hit the Play button, you may be prompted to save your process, and you are encouraged to do so. If not, please follow the screenshot below to “Save Process.

 

7) Please save the process into the repository you just created, which is BAN440_YourLastName. Name your process as BAN440_Lab1_YourLastName. Then click OK. (Important Note: You MUST name it as BAN440_Lab1_YourLastName to get credit for this step).

 

8) You will then see the following screenshot. In the Result Perspective, you can find the repository we created, which is “BAN440_YourLastName” on the right side of the screen. You should also be able to see the dataset “Chapter03DataSet_YourLastName” and the process “BAN440_Lab1_YourLastName,” both under the “BAN440_YourLastName” Repository.

 

9) Please switch back to the Design Perspective by clicking on “Design” as shown below. You will find the repository “BAN440_YourLastName,” the dataset “Chapter03DataSet_YourLastName,” and the process “BAN440_Lab1_YourLastName” on the left side of the screen.

 

10) You can toggle between design and results perspectives by clicking on “Design” or “Result.”

D. REPLACE MISSING VALUES

1) In order to find a tool (or an operator) in the Operators area, you can navigate through the folder tree in the lower left-hand corner of the screen. RapidMiner offers many tools/operators and sometimes, finding the one you want can be tricky. There is a handy search box, indicated by the red rectangle in the screenshot below that allows you to type in key words to find tools/operators that might do what you need.

Type in the word ‘missing’ into this search box, and you will see that RapidMiner automatically searches for tools/operators containing this word in their names. We want to replace missing values, and we can see that there is an operator called Replace Missing Values.

 

2) Now, let’s add this operator to our stream. Please click and hold on the operator name (from the left-hand side Operators pane), and drag it up to your spline. When you point your mouse cursor on the spline, the spline will turn slightly bold, indicating that when you let go of your mouse button, the operator will be connected into the stream.

If you let go and the Replace Missing Values operator fails to connect into your stream, you can reconfigure your splines manually. Simply click on the out port in your Retrieve operator, and then click on the exa port on the Replace Missing Values operator. Exa stands for example set, and ‘examples’ is the word RapidMiner uses for observations in a data set. Be sure the exa port from the Replace Missing Values operator is connected to your result set (res) port so that when you run your process, you will have output. Your model should now look similar to the screenshot below.  

Please make a screenshot now and replace my screenshot #1 with yours in the submission file (named as “BAN440 Lab 1 Submission YourLastName.docx”). Please make sure your screenshot shows your own last name in the related items we have added so far (see the red box in the above screenshot.)

3) When an operator is selected in RapidMiner, it has an orange rectangle around it. This will also enable you to modify that operator’s parameters, or properties. The Parameters pane is located on the right side of the RapidMiner window (see below).   

4) For this assignment, we have decided to change all missing values in the Online_Gaming attribute to ‘N’, since this is the most common response in that attribute. To do this, please make sure the Replace Missing Values operator is selected (with the orange border), and then change the ‘attribute filter type’ to ‘single.’ Then, and you will see a dropdown box appears under it (for ‘attribute’), allowing you to choose the Online_Gaming attribute as the target for modification. Next, expand the ‘default’ dropdown box, and select ‘value’, which will cause a ‘replenishment value’ box to appear. Type the replacement value ‘N’ in this box. Note that you may need to expand your RapidMiner window, or use the vertical scroll bar on the left of the Parameters pane in order to see all options, as the options change based on what you have selected. When you are done, your parameters should look like below.  

5) Please note that there are many other options available to you in the parameters pane. We will not explore all of them here, but feel free to experiment with them. For example, instead of changing a single attribute at a time, you could change a subset of attributes in your data set. You will learn much about the flexibility and power of RapidMiner by trying out different tools and features. When you have your parameter set, click the Play button. This will run your process and switch you to Results perspective once again. Your results should look below.

 

Please make a screenshot now and replace screenshot #2 with yours in the submission file (named as “BAN440 Lab 1 Submission YourLastName.docx”). Please make sure to show the Online_Gaming attribute in your screenshot (see the red box), by scrolling to the very left.

As you can see, now the Online_Gaming attribute has been moved to the very left side of the attributes, and there are no missing values. All missing values for Online_Gaming have been replaced by “N.” Now, let’s look at the Online_Shopping attribute. A question mark (?) denotes a missing value in an observation. For this variable, suppose we do not wish to replace the null values with the mode, but rather, we wish to remove those observations from our data set prior to mining it. This can be accomplished through data reduction.

 

热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图