代写Ships in satellite images代写留学生Python语言

Ships in satellite images

Problem description

To a large degree, financial data has traditionally been numeric in format.

But in recent years, non-numeric formats like image, text and audio have been introduced.

Private companies have satellites orbiting the Earth taking photos and offering them to customers. A financial analyst might be able to extract information from these photos that could aid in the prediction of the future price of a stock

Approximate number of customers visiting each store: count number of cars in parking lot

Approximate activity in a factory by counting number of supplier trucks arriving and number of delivery trucks leaving

Approximate demand for a commodity at each location: count cargo ships traveling between ports

In this assignment, we will attempt to recognize ships in satellite photos. This would be a first step toward counting.

As in any other domain: specific knowledge of the problem area will make you a better analyst. For this assignment, we will ignore domain-specific information and just try to use a labeled training set (photo plus a binary indicator for whether a ship is present/absent in the photo), assuming that the labels are perfect.

Goal:

In this notebook, you will need to create a model in sklearn to classify satellite photos.

The features are images: 3 dimensional collection of pixels

2 spatial dimensions

1 dimension with 3 features for different parts of the color spectrum: Red, Green, Blue

The labels are either 1 (ship is present) or 0 (ship is not present)

Learning objectives

Learn how to implement a model to solve a Classification task

Imports modules

In [ ]:

## Standard imports import numpy as np import pandas as pd import matplotlib.pyplot as plt import sklearn import os import math %matplotlib inline

In [ ]:

## Load the helper module from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = "all" # Reload all modules imported with %aimport %reload_ext autoreload %autoreload 1 # Import nn_helper module import helper %aimport helper helper = helper.Helper()

API for students

We have defined some utility routines in a file helper.py . There is a class named Helper in it.

This will simplify problem solving

More importantly: it adds structure to your submission so that it may be easily graded

helper = helper.Helper()

getData: Get a collection of labeled images, used as follows

data, labels = helper.getData()

showData: Visualize labelled images, used as follows

helper.showData(data, labels)

model_interpretation: Visualize the model parameters

helper.model_interpretation(Classifier)

Get the data

The first step in our Recipe is Get the Data.

We have provided a utility method getData to simplify this for you

In [ ]:

# Get the data data, labels = helper.getData() n_samples, width, height, channel = data.shape print("Data shape: ", data.shape) print("Labels shape: ", labels.shape) print("Label values: ", np.unique(labels))

Your expected outputs should be following

Date shape: (4000, 80, 80, 3)

Labels shape: (4000,)

Label values: [0 1]

We will shuffle the examples before doing anything else.

This is usually a good idea

Many datasets are naturally arranged in a non-random order, e.g., examples with the sample label grouped together

You want to make sure that, when you split the examples into training and test examples, each split has a similar distribution of examples

In [ ]:

# Shuffle the data first data, labels = sklearn.utils.shuffle(data, labels, random_state=42)

Have a look at the data

We will not go through all steps in the Recipe, nor in depth.

But here's a peek

In [ ]:

# Visualize the data samples helper.showData(data[:25], labels[:25])

Eliminate the color dimension

As a simplification, we will convert the image from color (RGB, with 3 "color" dimensions referred to as Red, Green and Blue) to gray scale.

In [ ]:

print("Original shape of data: ", data.shape) w = (.299, .587, .114) data_bw = np.sum(data *w, axis=3) print("New shape of data: ", data_bw.shape) In [ ]: # Visualize the data samples helper.showData(data_bw[:25], labels[:25], cmap="gray")

Have look at the data: Examine the image/label pairs

Rather than viewing the examples in random order, let's group them by label.

Perhaps we will learn something about the characteristics of images that contain ships.

We have loaded and shuffled our dataset, now we will take a look at image/label pairs.

Feel free to explore the data using your own ideas and techniques.

In [ ]:

# Inspect some data (images) num_each_label = 10 for lab in np.unique(labels): # Fetch images with different labels X_lab, y_lab = data_bw[ labels == lab ], labels[ labels == lab] # Display images fig = helper.showData( X_lab[:num_each_label], [ str(label) for label in y_lab[:num_each_label] ], cmap="gray") _= fig.suptitle("Label: "+ str(lab), fontsize=14) _= fig.show() print("\n\n")

It appears that a photo is labeled as having a ship present only if the ship is in the center of the photo.

Perhaps this prevents us from double-counting.

In any event: we have learned something about the examples that may help us in building models

Perhaps there is some feature engineering that we can perform. to better enable classification

Create a test set

To train and evaluate a model, we need to split the original dataset into a training subset (in-sample) and a test subset (out of sample).

Question:

Split the data

Set X_train, X_test, y_train and y_tests to match the description in the comment

90% will be used for training the model

10% will be used as validation (out of sample) examples

Hint:

Use train_test_split() from sklearn to perform. this split

Set the random_state parameter of train_test_split() to be 42

We will help you by

Assigning the feature vectors to X and the labels to y

Flattening the two dimensional spatial dimensions of the features to a single dimension

In [ ]: from sklearn.model_selection import train_test_split y = labels X = data_bw X_train = None X_test = None y_train = None y_test = None ### Flatten X X = X.reshape(X.shape[0], -1) # Split data into train and test # Create variables X_train, X_test, y_train, y_test # X_train: training examples # y_train: labels of the training examples # X_test: test examples # y_test: labels of test examples # YOUR CODE HERE raise NotImplementedError() print("X_train shape: ", X_train.shape) print("X_test shape: ", X_test.shape) print("y_train shape: ", y_train.shape) print("y_test shape: ", y_test.shape)

Your expected outputs should be following

X_train shape: (3600, 6400)

X_test shape: (400, 6400)

y_train shape: (3600,)

y_test shape: (400,)

In [ ]:

Prepare the data and Classifier

Questions:

You will transform. the data and create a Classifier.

The requirements are as follows:

Transform. the features (i.e., the pixel grids) into standardized values (mean 0, unit standard deviation)

Set a variable scaler to be your scaler

Create an sklearn Classifier

Set variable clf to be be your Classifier object

We recommend trying Logistic Regression first

sklearn 's implementation of Logistic Regression has many parameter choices

We recommend starting with the single parameter solver="liblinear"

You may want to use the sklearn manual to learn about the other parameters

Hints:

Look up StandardScaler in sklearn ; this is a transformation to create standardized values

You will use transformed examples both for training and test examples

So be sure that you can perform. the transformation on both sets of examples

Using Pipeline in sklearn , whose last element is a model, is a very convenient way to

Implement transformations and perform. model fitting/prediction

In a way that ensures that all examples, both training and test, are treated consistently

Enables Cross Validation without cheating

In [ ]:

import time from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split, cross_val_score from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline ## Data Scaler # Create a StandardScaler object # scaler: sklearn standard scaler scaler = None # YOUR CODE HERE raise NotImplementedError() ## Classification Model # Create a classifier # clf: sklearn classifier # name: string, name of your classifier # model_pipeline: sklearn Pipeline, if you use pipeline, please use this variable clf = None name = None # YOUR CODE HERE raise NotImplementedError()

In [ ]:

Train model

Question:

Use your Classifier or model pipeline to train your dataset and compute the in-sample accuracy

Set a variable score_in_sample to store the in-sample accuracy

Hint:

The sklearn function accuracy_score may be helpful

In [ ]:

from sklearn.metrics import accuracy_score # Set variable # score_in_sample: a scalar number, score for your in-sample examples score_in_sample = None # YOUR CODE HERE raise NotImplementedError() print("Model: {m:s} in sample score={s:3.2f}\n".format(m=name, s=score_in_sample))

In [ ]:

Train the model using Cross Validation

Since we only have one test set, we want to use 5-fold cross validation check model performance.

Question:

Use 5-fold Cross Validation

Set cross_val_scores as your scores of k-fold results

Set k as the number of folds

Report the average score

Hint:

cross_val_score in sklearn will be useful

In [ ]:

# Set variable # scores: an array of scores (length 5), one for each fold that is out-of-sample during cross-validation # k: number of folds cross_val_scores = None k = 5 t0 = time.time() # YOUR CODE HERE raise NotImplementedError() print("Model: {m:s} avg cross validation score={s:3.2f}\n".format(m=name, s=cross_val_scores.mean()) )

In [ ]:

How many parameters in the model ?

Question:

Calculate the number of parameters in your model. Report only the number of non-intercept parameters.

Set num_parameters to store the number of parameters

Hint:

The model object may have a method to help you ! Remember that Jupyter can help you find the methods that an object implements.

In [ ]:

# Set num_parameters equal to the number of non-intercept parameters in the model num_parameters = None # YOUR CODE HERE raise NotImplementedError() print("\nShape of intercept: {i}; shape of coefficients: {c}".format(i=clf.intercept_.shape, c=num_parameters) )

In [ ]:

Evaluate the model

Question:

We have trained our model. We now need to evaluate the model using the test dataset created in an earlier cell.

Please store the model accuracy on the test set in a variable named score_out_of_sample .

Hint:

If you have transformed examples for training, you must perform. the same transformation for test examples !

Remember: you fit the transformations only on the training examples, not on the test examples !

In [ ]:

# Set variable to store the model accuracy on the test set score_out_of_sample = None # YOUR CODE HERE raise NotImplementedError() print("Model: {m:s} out-of-sample score={s:3.2f}\n".format(m=name, s=score_out_of_sample))

In [ ]:

Visualize the parameters

Remember: there is a one-to-one association between parameters and input features (pixels).

So we can arrange the parameters into the same two dimensional grid structure as images.

This might tell us what "pattern" of features the model is trying to match.

In [ ]:

helper.model_interpretation(clf)

Further Exploration

Now you can build your own model using what you have learned from the course. Some ideas to try:

Was it a good idea to drop the "color" dimension by converting the 3 color channels to a single one ?

Can you interpret the coefficients of the model ? Is there a discernible "pattern" being matched ?

Feature engineering !

Come up with some ideas for features that may be predictive, e.g, patterns of pixels

Test them

Use Error Analysis to guide your feature engineering

Add a regularization penalty to your loss function

How does this affect

The in-sample fit ?

The visualization of the parameters

Hint: The sklearn LogisticRegression model

has several choices for the penalty parameter

has a variable value for the regularization strength parameter C

Observe the effect of each change on the Loss and Accuracy.

In [ ]:




热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图