Data Science Master Chef

Saghir Bashir (www.ilustat.com)

A Master Chef

A master chef can transform any set of ingredients into an eye pleasing and appetising festival for your taste buds. Before embarking on their creation, they will begin by understanding the quality and freshness of all the ingredients which will determine the quantities in which they will be fused. Understanding the source and origins will be equally important to them (e.g. type of olives, location grown and extraction method for olive oil). Their full grasp of the processes (e.g. mixing, baking, steaming) will lead to a beautifully presented and mouthwatering delight for you to savour. Subtle or even major variations in the processes will result in differing masterpieces for you to feast on. It is possible that even with this greatness that some (or all) of them are not to your liking.

If the quality of the ingredients are lower than desired, a master chef with knowledge, experience and understanding, will still serve up an appetising delight (perhaps with less profound flavours). In the event that some of the ingredients are missing the situation could still be salvaged by using suitable substitutes (or even doing without). However if a key ingredient is missing (e.g. eggs for an omelette) or if the overall quality is terrible then even a master chef cannot rescue the situation.

The brands of the equipment/tools (e.g. sauce pans, ovens) do not determine the success of the end product. Provided that they are of good quality then they should be enough. The real art is in knowing how, when and why to use them.

A Data Science Master Chef

A data science master chef can transform a set of data into a festival of numbers for your eyes to savour and your mind to indulge. Before embarking on any analysis, they will begin by understanding the quality, appropriateness and validity of the data. Understanding the source and origins (e.g. dates, data collection methods, bias reduction measures) will be equally important to them. Their full grasp of the processes (e.g. summarising, modelling or visualising) will, through their mastery, reveal relevant and insightful patterns in the data. Subtle or even major variations in the processes will result in alternate savvy insights for you to ponder and to act on. It is possible that even with this greatness that some (or all) of them are not in agreement with your point of view.

If the quality of the data is lower than desired then the data science master chef can still construct practical and pragmatic insights (most likely with more uncertainty). If some of the data are missing the situation could still be rescued with suitable substitutes (e.g. surrogate variables, missing data methods) or even doing without. If key data is (mostly) missing (e.g. date of event for time to event data) then it is a lost cause. Further if the overall quality is terrible then even a data science master chef cannot rescue the situation.

The brands of the equipment/tools (e.g. data processing and analysis software, IDE) do not determine the success of the end product. Provided that they are of good quality then they should be enough. The real art is in knowing how, when and why to use them.

A Fast Food Data Scientist

The appeal of fast food is convenience and “taste” often created by using lower quality ingredients with some (primarily fat, salt and sugar) in unhealthy doses. The processing is often rudimentary in the form of frying or microwaving. The (very) occasional consumption of fast food may be an acceptable experience but as a day-to-day dietary regime it will lead to undesirable longer term health effects. The same ingredients with the proper understanding and processing could lead to a gratifying experience. If the ingredients are of lower quality then this will be reflected in the flavour.

A fast food data scientist will take some data and pipe it through a “sexy” model or algorithm. Occasionally the results could be alluring and useful by chance. However, such “frying” or “microwaving” of the data will only lead to undesirable consequences in the longer term. The same data with the proper understanding and processing could lead to useful and usable results. If the data are of lower quality then this should be reflected in the statistical variation (uncertainty).

Summary

A data scientist like a master chef should fully understand the ingredients (data) of their craft before any processing which includes going back to the source and origins of the data. This foundation will assist in wisely choosing the best methods and processes to reveal relevant and insightful patterns in the data. Although not every one will agree with or like the patterns that are presented they will appreciate the trustworthiness of the results and reasoning. They will also know that is far better for them than fast food data science.

Share