# Advice on study plan for Applied Math/Machine learning courses, for PhD degree not in either subject

My question requires a little background which I will give here:

I'm heading into the 2nd year of my Epidemiology PhD program. I work on vaccine trials/infection control using simple non-linear math models and love it. The majority of in-department modeling courses have been survey courses focused on translating epidemiological concepts into models, only briefly touching on the math used for implementation.

I've signed up for some introductory courses in the econometrics department (with my department's permission) which cover bayesian statistics, pdfs, importance sampling etc. I chose courses in the econometrics department as I thought they would be more applied and have more students at a similar math level to myself.

1) My main anxiety is that I've always been a solid B grade math student. I learn best through project-based learning and hands-on implementation. Formula based lecturing and proofs have been very difficult for me and I find that subsequently I don't retain much.

2) I've been able to learn more difficult mathematical concepts using computational software (I am a proficient R and MATLAB user) and doing self-teaching tutorials. Concepts that have previously been explained to me in math theory 4-5 times 'click' when I code them out and can experiment with sampling, seeing actual results. (for example, mcmc metropolis-hastings)

**3) What would be your recommendations be for devising a study plan for these classes, and my future classes, which maximizes learning the relevant mathematical concepts given this 'learning style'? Were there any websites, types of courses, or textbooks/exercise books that taught difficult concepts with code but didn't overly rely on pre-built functions?**

People in other departments either seem to grasp concepts effortlessly or stop trying to learn them. In addition no one else else in my cohort or the years above work on these type of questions or models so I have no one to ask for advice.

I am under no delusions about my mathematical abilities here and have no ambition to pursue math/statistical/machine learning research. My goal is to reach a level where I am able to fully comprehend the tools used in my particular field and be able to implement and critique them at a 100% competent level. Thanks!

## 1 Answer

## Concepts

First, you should know that machine learning is nothing special. For the most part, the term is only one of a series of buzzwords exchanged in order to maintain linguistic novelty in popular reporting, grant applications, and so on. Machine learning refers mainly to a class of adaptive algorithms that are amenable to, or at least worth applying learning theory to analyze. (And since you're interested in the applications, there's no need to look into that rarefied area.)

Now, that being the case, what does the field stem from? Three more basic mathematical areas:

- Probability (used to define basic intuitions, thinking, and formal notions)
- Linear algebra (used to describe processed information and algorithmic steps, such as feature selection and regularization)
- Calculus (used for optimization formulations, proofs, and continuous distributions)

If you have a good grip on these three things, you can handle more or less any machine learning paper or topic. Some challenges you might face, judging from your stated style:

- Probability: You don't have to be able to recite everything about sigma-algebras or Borel fields, but you need to have the axioms and their implications down. You also need to understand why something will (or will not) converge. If you're doing Bayesian statistics, you need to grasp parameter estimation and be able to understand what the different formulas are expressing. This is just a lot of math, as much as it is eventually codeable.
- Linear algebra: Notation can be quite complicated. For example, MRI or EEG data typically come in tensors, which you can think of as N-dimensional matrices. Machine learning papers often throw around such linear algebra forms as easily as they were scalar forms, because they assume the audience can handle it. You should try to make sure you don't become disoriented, and can visualize what is happening spatially.
- Calculus: The main thing that you need to be able to grasp is optimization. Ultimately it boils down to proving that something is a maximum or minimum (a cost function in the case of most machine learning algorithms), often expressed in terms of a Lagrangian. SVM is a classic case of this, expressed as a quadratic optimization problem with vectors.

## Resources

- Trevor Hastie has a couple of good books that may suit your approach, one with applications in R, and one called Elements of Statistical Learning that my advisor recommended to me. He makes them available for free at https://web.stanford.edu/~hastie/pub.htm.
- I personally find Andrew Ng's teaching approach overly intricate and unmotivating, but Abu-Mostafa is a gem to watch: http://www.youtube.com/playlist?list=PLD63A284B7615313A
- For probability, I highly recommend this Oxford book. It's well-formatted and you can flip to sections quickly to refresh yourself.

## Strategies

Be willing to compromise. I myself learned R and machine learning basics at the same time, and I was more or less free to work out both the code and implementation in practice however I liked. However, for actual use, I typically (with few exceptions) trust the published authors of R packages to have covered more bases than I.

Figure out what you want to know. For example, I'm not interested in learning the innards of the latest way to efficiently update a neural network too big to fit in RAM; I just want to be able to phrase my questions in a rigorous way that allows me to judge whether or not such an approach is warranted or helpful.