AM10 homework problems

## Problem 1: Swimming in an anisotropic environment

### Preamble

Micro-organisms play a vital role in many biological and medical processes. Understanding and quantifying their diffusion characteristics is key to understanding their mobility and retention in soil, rate of formation of aggregates that eventually form biofilm, and even the migration of infectious bacteria in the human body, among other applications.

In real conditions, the local environment for living microorganisms can be strongly anisotropic exhibiting different diffusive properties along different directions. The anisotropy may come, for example, from the presence of more nutrients in one direction versus another, something that the microorganisms are able to detect (evolution at work!). 




### Statement of the problem

In this problem, you will solve the 2D diffusion equation for micro-organisms initially originating at $x=0, y=0$, with their concentration following the diffusive behavior of a Gaussian distribution with different values of the diffusion coefficient $D$ in each dimension (namely, $D_1$ in the x-dimension, and $D_2$ in the y-dimension). 

The *initial* distribution is a normalized Gaussian with $\sigma_x = 0.1,  \sigma_y = 0.1$. 

The solution of the diffusion equation in 2D is a gaussian in the $x$ and $y$ directions, with variance $\sigma_x$ and $\sigma_y$, respectively, that depends on the time, $t$, and on the diffusion coefficients $D_1$ and $D_2$. Consider values $D_1$=4, and $D_2$=1 for the diffusion coefficients but choose your own values for the rest of the parameters. 

(a) Plot the concentration every 50 time-steps (consider the total number of time-steps to be nsteps=201), using contour plots. 

(b) Plot the concentration at a distance of  $x = 5\sigma_x$ and $y = 5 \sigma_y$ from the origin, as a function of time.

## Problem 6: Predict likely targets for a bank marketing campaign

#### Introduction

In this problem, you will predict which bank customers would be likely to open a Personal Loan account with your bank. You have a dataset `Bank_Personal_Loan_Modelling.csv` which contains 11 columns of data with a sample of current bank customers, plus a 12th column that indicates whether the customer chose to open a Personal Loan account during a previous marketing effort.

Your goal is to train a neural network to predict which customers might be likely to respond to a targeted Personal Loan marketing campaign in the future, based on the characteristics of customers who responded favorably in the previous campaign.

The data you have available on your customers is as follows:

- Age : Customer's age
- Experience : Years of professional experience
- Income : Annual income of the customer (thousands)
- Family : Family size of the customer
- CCAvg : Avg. spending on credit cards per month (thousands)
- Education : Education Level (1: Undergrad or less; 2: Graduate; 3: Advanced/Professional)
- Mortgage : Value of house mortgage if any (thousands)
- Securities Account : Does the customer have a securities account with the bank?
- CD Account : Does the customer have a certificate of deposit (CD) account with the bank?
- Online : Does the customer use internet banking facilities?
- Credit card : Does the customer use a credit card issued by the bank?
- Personal Loan? : Did this customer accept the personal loan offered in the last campaign?

The data is presented in CSV format. Each row represents one customer. The binary classes are entered as 0 (negative) and 1 (positive).

Reference: The data have been adapted from a Kaggle page by Kranti Walke, https://www.kaggle.com/datasets/krantiswalke/bank-personal-loan-modelling/data.

### Your tasks (following Week 12 lecture/lab):

1. The datafile `Bank_Personal_Loan_Modelling.csv
Download Bank_Personal_Loan_Modelling.csv` provides the entire data. Open and inspect the datafile to check if skiprows are needed when loading the data using the `np.loadtxt()` function. Inspect the input (features) columns (for `X`), and the target class column (for `y`), to properly load the `X` and `y`.

2. Normalize *each column* of the entire dataset by dividing by the maximum value of the column (hint: use `np.max` with an appropriate choice of `axis`). For this dataset, you do *not* need to do the "one-hot" encoding, since the output `y` is already binary 0/1.

3. Split the data into training and test sets. Use an 80%/20% split. Be sure to randomly sort the data before splitting it.

4. Make sure that all the test and train arrays are *2D arrays*. In particular, the `y` arrays should have shape `(N_train, 1)` or `(N_test, 1)`. Use `np.reshape` if needed.

5. Set up a neural net with 1 hidden layer to predict the output `y` from the input features `X`. Note there is only a single node as the final output. Use `sigmoid` for the final output so it will be normalized between 0 and 1.

6. Train the net using minibatch gradient descent. Use binary cross-entropy to calculate the `loss` (use Lab 12 `bce_loss`). Keep training until the loss seems to reach a minimum. Adjust the learning rate as needed.
7. Report the accuracy on the train-set and on the test-set. Use `np.round` to convert the final (sigmoid) output of the net to a binary choice of labels 0/1.

8. Now train a net with 3 hidden layers on the same classification task. Can you improve the accuracy? Do you encounter overfitting?