The Kaggle Book by Konrad Banachewicz and Luca Massaron is a book about competing on Kaggle. The introductory chapters tell you all about Kaggle and the competitions it sponsors. The bulk of the book provides details on how to compete better against a variety of adversaries. The book ends with some insights into how competing in Kaggle can help with other areas of your life. If the book ended here, it would still be well worth reading cover-to-cover as I did, but it doesn’t end here.
My main reasons for reading the book were to find out more about how data is created and vetted on Kaggle, and to obtain some more insights on how to write better data science applications. In the end, the bulk of this book is a rather intense treatment of data science with a strong Kaggle twist. If you’re looking for datasets and to understand techniques for using them effectively, then this is the book you want to get because the authors are both experts in the field. The biases in the book are toward data management, verification, validation, and checks for model goodness. It’s the model goodness part that is hard to find in any other book (at least, the ones I’ve read so far).
The book does contain interviews from other people who have participated in Kaggle competitions. I did read a number of these interviews and found that they didn’t help me personally because of my goals in reading the book. However, I have no doubt that they’d help someone who was actually intending to enter a Kaggle competition, which sounds like a great deal of work. Before I read the book, I had no idea of just how much goes into these competitions and what the competitors have to do to have a chance of winning. What I found most important is that the authors stress the need to get something more out of a competition than simply winning—that winning is just a potential outcome of a much longer process of learning, skill building, and team building.
You really need this book if you are into data science at all because it helps you gain new insights into working through data science problems and ensuring that you’re getting a good result. I know that my own person skills will be improved as I apply the techniques described in the book, which really do apply to every kind of data science development and not just to Kaggle competitions.