The amount of data stored in the world's databases doubles every 20 months, and clinicians, familiar with traditional statistical methods, are at a loss to analyze them. Traditional methods have, indeed, difficulty to identify outliers in large datasets, and to find patterns in big data and data with multiple exposure / outcome variables. In addition, analysis-rules for surveys and questionnaires, which are currently common methods of data collection, are, essentially, missing. Fortunately, the new discipline, machine learning, is able to cover all of these limitations.
So far medical professionals have been rather reluctant to use machine learning. Also, in the field of diagnosis making, few doctors may want a computer checking them, are interested in collaboration with a computer or with computer engineers. Adequate health and health care will, however, soon be impossible without proper data supervision from modern machine learning methodologies like cluster models, neural networks and other data mining methodologies.
Each chapter starts with purposes and scientific questions. Then, step-by-step analyses, using data examples, are given. Finally, a paragraph with conclusion, and references to the corresponding sites of three introductory textbooks, previously written by the same authors, is given.
Obviously, it is time that medical and health professionals mastered their reluctance to use machine learning and the current 100 page cookbook should be helpful to that aim. It covers in a condensed form the subjects reviewed in the 750 page three volume textbook by the same authors, entitled “Machine Learning in Medicine I-III” (ed. by Springer, Heidelberg, Germany, 2013) and was written as a hand-hold presentation and must-read publication. It was written not only to investigators and students in the fields, but also to jaded clinicians new to the methods and lacking time to read the entire textbooks.
General purposes and scientific questions of the methods are only briefly mentioned, but full attention is given to the technical details. The two authors, a statistician and current president of the International Association of Biostatistics and a clinician and past-president of the American College of Angiology, provide plenty of step-by-step analyses from their own research and data files for self-assessment are available at extras.springer.com.
From their experience the authors demonstrate that machine learning performs sometimes better than traditional statistics does. Machine learning may have little options for adjusting confounding and interaction, but you can add propensity scores and interaction variables to almost any machine learning method.
The first edition in 2010 was the first publication of a complete overview of SPSS methodologies for medical and health statistics. Well over 100,000 copies of various chapters were sold within the first year of publication. Reasons for a rewrite were four.
First, many important comments from readers urged for a rewrite. Second, SPSS has produced many updates and upgrades, with relevant novel and improved methodologies. Third, the authors felt that the chapter texts needed some improvements for better readability: chapters have now been classified according the outcome data helpful for choosing your analysis rapidly, a schematic overview of data, and explanatory graphs have been added. Fourth, current data are increasingly complex and many important methods for analysis were missing in the first edition.
For that latter purpose some more advanced methods seemed unavoidable, like hierarchical loglinear methods, gamma and Tweedie regressions and random intercept analyses. In order for the contents of the book to remain covered by the title, the authors renamed the book: SPSS for Starters and 2nd Levelers.
Special care was, nonetheless, taken to keep things as simple as possible, simple menu commands are given. The arithmetic is still of a no-more-than high-school level. Step-by-step analyses of different statistical methodologies are given with the help of 60 SPSS data files available through the internet. Because of the lack of time of this busy group of people, the authors have given every effort to produce a text as succinct as possible.
Modern Bayesian statistics is based on biological likelihoods, and may better fit clinical data than traditional tests based normal distributions do. This is the first edition to systematically imply modern Bayesian statistics in traditional clinical data analysis. This edition also demonstrates that Markov Chain Monte Carlo procedures laid out as Bayesian tests provide more robust correlation coefficients than traditional tests do. It also shows that traditional path statistics are both textually and conceptionally like Bayes theorems, and that structural equations models computed from them are the basis of multistep regressions, as used with causal Bayesian networks.
The authors, as professors in statistics and machine learning at European universities, are worried, that their students find regression-analyses harder than any other methodology in statistics. This is serious, because almost all of the novel methodologies in current data mining and data analysis include elements of regression-analysis. It is the main incentive for writing this 28 chapter edition, consistent of
- 28 major fields of regression analysis,
- their condensed maths,
- their applications in medical and health research as published so far,
- step by step analyses for self-assessment,
- conclusion and reference sections.
Traditional regression analysis is adequate for epidemiology, but lacks the precision required for clinical investigations. However, in the past two decades modern regression methods have proven to be much more precise. And so it is time, that a book described regression analyses for clinicians. The current edition is the first to do so. It is written for a non-mathematical readership. Self-assessment data-files are provided through Springer' s "Extras Online".
The principles of statistical analysis are easily forgotten in today’s world of time-saving algorithms. This step-by-step primer takes researchers back to basics, enabling them to examine their own data through a series of sums on a simple pocket calculator.
Each test method is reported together with (1) a data example from practice, (2) all steps to be taken using a scientific pocket calculator, and (3) the main results and their interpretation. Although several of the described methods can also be carried out with the help of statistical software, the latter procedure will be considerably slower.
Both part 1 and 2 of this title consist of a minimum of text and this will enhance the process of mastering the methods. Yet the authors recommend that for a better understanding of the test procedures the books be used together with the same authors' textbook "Statistics Applied to Clinical Studies" 5th edition edited 2012, by Springer Dordrecht Netherlands. More complex data files like data files with multiple treatment modalities or multiple predictor variables can not be analyzed with a pocket calculator. We recommend that the small books "SPSS for starters", Part 1 and 2 (Springer, Dordrecht, 2010, and 2012) from the same authors be used as a complementary help for the readers' benefit.
1.This book is the third volume of a three volume series of cookbooks entitled "Machine Learning in Medicine - Cookbooks One, Two, and Three". No other self-assessment works for the medical and health care community covering the field of machine learning have been published to date.
2. Each chapter of the book can be studied without the need to consult other chapters, and can, for the readership's convenience, be downloaded from the internet. Self-assessment examples are available at extras.springer.com.
3. An adequate command of machine learning methodologies is a requirement for physicians and other health workers, particularly now, because the amount of medical computer data files currently doubles every 20 months, and, because, soon, it will be impossible for them to take proper data-based health decisions without the help of machine learning.
4. Given the importance of knowledge of machine learning in the medical and health care community, and the current lack of knowledge of it, the readership will consist of any physician and health worker.
5. The book was written in a simple language in order to enhance readability not only for the advanced but also for the novices.
6. The book is multipurpose, it is an introduction for ignorant, a primer for the inexperienced, and a self-assessment handbook for the advanced.
7. The book, was, particularly, written for jaded physicians and any other health care professionals lacking time to read the entire series of three textbooks.
8. Like the other two cookbooks it contains technical descriptions and self-assessment examples of 20 important computer methodologies for medical data analysis, and it, largely, skips the theoretical and mathematical background.
9. Information of theoretical and mathematical background of the methods described are displayed in a "notes" section at the end of each chapter.
10.Unlike traditional statistical methods, the machine learning methodologies are able to analyze big data including thousands of cases and hundreds of variables.
11. The medical and health care community is little aware of the multidimensional nature of current medical data files, and experimental clinical studies are not helpful to that aim either, because these studies, usually, assume that subgroup characteristics are unimportant, as long as the study is randomized. This is, of course, untrue, because any subgroup characteristic may be vital to an individual at risk.
12. To date, except for a three volume introductary series on the subject entitled "Machine Learning in Medicine Part One, Two, and Thee, 2013, Springer Heidelberg Germany" from the same authors, and the current cookbook series, no books on machine learning in medicine have been published.
13. Another unique feature of the cookbooks is that it was jointly written by two authors from different disciplines, one being a clinician/clinical pharmacologist, one being a mathematician/biostatistician.
14. The authors have also jointly been teaching at universities and institutions throughout Europe and the USA for the past 20 years.
15. The authors have managed to cover the field of medical data analysis in a nonmathematical way for the benefit of medical and health workers.
16. The authors already successfully published many statistics textbooks and self-assessment books, e.g., the 67 chapter textbook entitled "Statistics Applied to Clinical Studies 5th Edition, 2012, Springer Heidelberg Germany" with downloads of 62,826 copies.
17. The current cookbook makes use, in addition to SPSS statistical software, of various free calculators from the internet, as well as the Konstanz Information Miner (Knime), a widely approved free machine learning package, and the free Weka Data Mining package from New Zealand.
18. The above software packages with hundreds of nodes, the basic processing units including virtually all of the statistical and data mining methods, can be used not only for data analyses, but also for appropriate data storage.
19. The current cookbook shows, particularly, for those with little affinity to value tables, that data mining in the form of a visualization process is very well feasible, and often more revealing than traditional statistics.
20.The Knime and Weka data miners uses widely available excel data files.
21. In current clinical research prospective cohort studies are increasingly replacing the costly controlled clinical trials, and modern machine learning methodologies like probit and tobit regressions as well as neural networks, Bayesian networks, and support vector machines prove to better fit their analysis than traditional statistical methods do.
22. The current cookbook not only includes concise descriptions of standard machine learning methods, but also of more recent methods like the linear machine learning models using ordinal and loglinear regression.
23. Machine learning tends to increasingly use evolutionary operation methodologies. Also this subject has been covered.
24. All of the methods described have been applied in the authors' own research prior to this publication.
Obviously, it is time that medical and health professionals mastered their reluctance to use machine learning methods and this was the main incentive for the authors to complete a series of three textbooks entitled “Machine Learning in Medicine Part One, Two and Three, Springer Heidelberg Germany, 2012-2013", describing in a nonmathematical way over sixty machine learning methodologies, as available in SPSS statistical software and other major software programs. Although well received, it came to our attention that physicians and students often lacked time to read the entire books, and requested a small book, without background information and theoretical discussions and highlighting technical details.
For this reason we produced a 100 page cookbook, entitled "Machine Learning in Medicine - Cookbook One", with data examples available at extras.springer.com for self-assessment and with reference to the above textbooks for background information. Already at the completion of this cookbook we came to realize, that many essential methods were not covered. The current volume, entitled "Machine Learning in Medicine - Cookbook Two" is complementary to the first and also intended for providing a more balanced view of the field and thus, as a must-read not only for physicians and students, but also for any one involved in the process and progress of health and health care.
Similarly to Machine Learning in Medicine - Cookbook One, the current work will describe stepwise analyses of over twenty machine learning methods, that are, likewise, based on the three major machine learning methodologies:Cluster methodologies (Chaps. 1-3)
In extras.springer.com the data files of the examples are given, as well as XML (Extended Mark up Language), SPS (Syntax) and ZIP (compressed) files for outcome predictions in future patients. In addition to condensed versions of the methods, fully described in the above three textbooks, an introduction is given to SPSS Modeler (SPSS' data mining workbench) in the Chaps. 15, 18, 19, while improved statistical methods like various automated analyses and Monte Carlo simulation models are in the Chaps. 1, 5, 7 and 8.
We should emphasize that all of the methods described have been successfully applied in practice by the authors, both of them professors in applied statistics and machine learning at the European Community College of Pharmaceutical Medicine in Lyon France. We recommend the current work not only as a training companion to investigators and students, because of plenty of step by step analyses given, but also as a brief introductory text to jaded clinicians new to the methods. For the latter purpose, background and theoretical information have been replaced with the appropriate references to the above textbooks, while single sections addressing "general purposes", "main scientific questions" and "conclusions" are given in place.
Finally, we will demonstrate that modern machine learning performs sometimes better than traditional statistics does. Machine learning may have little options for adjusting confounding and interaction, but you can add propensity scores and interaction variables to almost any machine learning method.
Are there alternative works in the field? Yes, there are, particularly in the field of psychology. Psychologists have invented meta-analyses in 1970, and have continuously updated methodologies. Although very interesting, their work, just like the whole discipline of psychology, is rather explorative in nature, and so is their focus to meta-analysis. Then, there is the field of epidemiologists. Many of them are from the school of angry young men, who publish shocking news all the time, and JAMA and other publishers are happy to publish it. The reality is, of course, that things are usually not as bad as they seem. Finally, some textbooks, written by professional statisticians, tend to use software programs with miserable menu programs and requiring lots of syntax to be learnt. This is prohibitive to clinical and other health professionals.
The current edition is the first textbook in the field of meta-analysis entirely written by two clinical scientists, and it consists of many data examples and step by step analyses, mostly from the authors' own clinical research.
In the past few years, the HOW-SO of current statistical tests has been made much more simple than it was in the past, thanks to the abundance of statistical software programs of an excellent quality. However, the WHY-SO may have been somewhat under-emphasized. For example, why do statistical tests constantly use unfamiliar terms, like probability distributions, hypothesis testing, randomness, normality, scientific rigor, and why are Gaussian curves so hard, and do they make non-mathematicians getting lost all the time? The book will cover the WHY-SOs.