ml

**Books**

- Elements of Statistical Learning (Math Heavy)
- Introduction to Statistical Learning (More Hands-On)

**Courses**

**More Random Stuff (From HN)**

Sure, a couple things. (I'm assuming you're comfortable with multivariable calculus.) Andrew Ng's coursera course is good. PRML (pattern recognition and machine learning) by bishop is good, and has a useful introduction to probability theory. You also want a good grounding in linear algebra. Strang is basically the authority on linear: http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-... You want a strong grounding in probability theory and statistics. (This is the basic language and intuition of the entire field.) I don't have as many preferences here (although its the most important); someone in this thread pointed to a course on statistical learning @ stanford that's good. A good understanding of optimization is helpful. Here's a link that leads to a useful MOOC for that: http://stanford.edu/~boyd/cvxbook/ there's a lot of other stuff (markov decision processes, gaussian processes, monte carlo methods come to mind) that is useful that I'm not pointing to, but if you've hit the other stuff here then you'll probably be able to find out those things. If you're into it, https://www.coursera.org/course/pgm is good but not vital. You may want to know about reinforcement learning. This answer does better than I can: https://www.quora.com/What-are-the-best-books-about-reinforc... Deep learning seems popular these days :) (http://www.deeplearningbook.org/) Otherwise, it depends on the domain. For NLP, there's a great stanford course on deep learning + NLP (http://cs224d.stanford.edu/syllabus.html), but there's a ton of domain knowledge for most NLP work (and a lot of it really centers around data preparation). For speech, theoretical computer science matters (weighted finite state transducers, formal languages, etc.) For vision, again, stanford: (http://cs231n.stanford.edu/syllabus.html) For other applications, well, ask someone else? :) Also: arxiv.org/list/cs.CL/recent arxiv.org/list/cs.NE/recent arxiv.org/list/cs.LG/recent arxiv.org/list/cs.AI/recent EDIT: unfortunately, there's also a lot of practitioner's dark art; I picked a lot up as a research assistant, and then my first year in industry felt like being strapped to a rocket.

- https://parquet.apache.org/documentation/latest/ (Column-Based Storage)

- Spark MLLib – How parallel is it?

ml.txt · Last modified: 2016/07/01 12:09 by Volker