Data is omnipresent for organizations undergoing digital transformation. It is no longer just a matter of reporting or BI. The advent of Big Data and Data Science requires a true capacity to create value through Data.
The emergence of the roles of Chief Data Officer (CDO) materializes the awareness of organizations to this effect. Samir Amellal is a CDO present and active on transformational data subjects, from government, startups, to the private sector.
We have dedicated this interview to the subject of Data Quality, a paradigm that is becoming a prerequisite. Data Quality aligns with Quality Engineering practices, forcing Data systems to continuously create value.
We were able to address the following topics:
- Why Data Quality is part of the company’s strategic priorities
- What definition and what value creation challenges for Data Quality
- How Data Quality Applies the Quality Engineering paradigm
- How Data Quality can be composed with DataOps, Data Mesh and AI
- What external factors accelerate the need to address Data Quality
- What products, solutions and opportunities emerge in the ecosystem
Join the QE Unit to access more exclusive content from the community.
About Samir Amellal
Samir holds a master’s degree in information and data technologies from the University of Lille and a master’s degree in e-business from SKEMA Business School. He started his career at Accenture Interactive as a Data Scientist for La Redoute. He then worked as an R&D engineer at Buongiorno before joining Publicis (Ex. Publicis ETO). For 7 years, he held various positions there. First as Data Sciences Project Director, in charge of Danone, LVMH and Total accounts. Then Deputy CEO in charge of Data Intelligence before being appointed Chief Data Officer of Publicis France in 2015.
Samir Amellal joined the Fullsix France agency (Havas group) in March 2017 as Managing Director. With the mission of piloting data, innovation and foresight. In 2018, he created the Havas Helia agency, specializing in CRM and Data management, within the group.
He then joined La Redoute in March 2019 as Chief Data Officer, in charge of data governance. With the ambition to build a strategy driven by data, technology and Artificial Intelligence to offer the best products and services to the customer and enrich the customer experience.
Antoine: Can you start by introducing yourself?
I have 43 years, I work in data for all my career. I have a double degree in computer science and from a business school. I also did a lot of econometrics, so I already have almost a triple degree.
I started in the telecoms sector with a Japanese operator NTT Docomo, an equivalent of Orange in Japan as a Data research engineer. I then went to Publicis, a big French player in communications, where I held several positions. First, in an IT provider which had been bought, Publicis ETO and which is now called Epsilon France. I took over the role of CEO in charge of Data in this subsidiary which currently has around 1000 employees in France and others worldwide. I then joined the Vivendi group to spend practically more than 2 years there.
Since then, I have been CDO at La Redoute for almost 3 years in different contexts and challenges. The ecosystem has clearly evolved. We also have issues to address together Antoine on the Data part in particular; even if you have more architectural issues that are broader than those of Data, we have common issues.
Antoine: I know that you are also present more widely in the ecosystem, especially in the public, universities, government. I am convinced that it gives you a holistic perspective of Data. Can you tell us more?
I teach in several engineering and business schools. I am participating in a mission for the French government on the adoption of AI and Data in large French groups undergoing transformation. In addition, I participate in the development of start-ups, having also patented several devices.
Antoine: Before entering the Data Quality theme, what are your corporate priorities as a CDO?
The major priority, which is that of many others, is to become a data-driven organization. This results in massive adoption of the data by all employees and departments. Microcomputing has been adopted in all companies. Today we are in finance, marketing, logistics; everyone uses a word processor, emails and excel files. Data must follow this same path.
The actors must know how to use Data in the broad sense, via algorithms, reports etc. It is no longer a possible saving for organizations. Data is therefore a major transformation imperative to be organized. To achieve it correctly, governance is structuring. As with IT, we cannot afford to multiply the solutions. I think that many CDOs are at this stage of dissemination and extension of Data in their companies.
Antoine: We have chosen the theme of Data Quality. What is your definition and what challenges does it address?
Data Quality represents real challenges in several respects. The first is technical; qualitative data is not only error-free data. This also involves shared interfaces between IT and Data, addressing real issues around the quality of flows. For example, specific flows must be stable and reliable for critical business needs, even more so in Data Science.
“Data Quality is a prerequisite for organizations transforming to be Data-Driven, where data is at the heart of every process, decision-making and improvements.”Samir Amellal
We can hardly tolerate an AI which becomes inoperative, which causes bad decisions due to poor data quality or unavailability. This involves issues of flow supervision. It is not as intuitive as DataOps but for me it is an essential element. The challenge is to have confidence in this data; it must have a certain reliability, consistency and a capacity to be disseminated throughout the organization.
Data quality is one of the minimum prerequisites for sharing and using data. We must have minimum requirement levels for the company. Confidence in horizontal and vertical in the organization is necessary.
Antoine: Data Quality has therefore become a fundamental prerequisite for the creation of value through data. Are there other criteria to take into account?
Beyond the aspects of monitoring and reliability, I would mention synchronicity. Take the example of La Redoute; if we send a mobile push to the customer to inform him that his parcel is arriving at the bottom of his home while the call center tells him that the parcel has not come out of the warehouse, it is more than problematic. We find answers in event-driven and real-time architectures. I tend to strongly consider this aspect of synchronicity, fundamental to the creation of value through data.
Therefore, a loss of confidence in the consistency, availability, and reliability of data within or between systems is essential to Data Quality.
Antoine: Moreover, the Forrester report shows that Data Quality is a criterion that has taken first place, ahead of having the right expertise and the support of the management committee. Do you confirm this trend in the ecosystem?
This is a good point, the awareness according to the maturity of organizations and CDOs. In the past, we had to demonstrate the possibility of value creation through data. We heard a lot about Big Data, GAFA practices and we wondered how we could reach the same level. So we did a lot of POCs with a low rate of deployment to scale.
“Data Quality is a prerequisite for the creation of value through Data.”Samir Amellal
For some time now, most mature CDOs have advanced to this industrialization stage. When I arrived at La Redoute, that was one of my first concerns. I started with foundations like DataOps, flow monitoring, a healthier architecture, documentation, etc. These are critical elements to ensure the stability, scalability and maintenance of the devices.
Leading historical companies outside of pure-players, such as La Redoute and others, are moving to a stage where governance, dissemination and data practices are taking place. One of the main prerequisites is Data Quality. We cannot afford to give the keys to the truck to the different teams without ensuring an alignment and sharing of uses. When we start to cross this Data-Driven milestone, Data Quality is critical, whether for reporting, algorithms or Data Science.
Antoine: Data Quality is at the heart of digital evolution trends. Reporting and BI have evolved into Big Data. Data Lakes in the Cloud can bring more customer or operational insights. DataOps supports the industrialization of Data Science. Do ethics and AI topics also accelerate this need for data quality?
Exactly. Ethics is a broad subject, moreover the IA Act is being prepared at a European level. Within this topic of ethics, there are several sub-topics. First, ethics is subjective; we are talking about artificial intelligence, not artificial consciousness. We are therefore on devices and processes that reproduce complex tasks formerly reserved only for humans. Technological advances allow us to implement some of them. Ethics are consequently a reflection of that of the company developing and using it.
In addition, training these AIs on qualitative data is fundamental. Building an AI on incorrect or biased data will increase the likelihood of poor decision-making. We will therefore have a problem of confidence or even of unconsciousness of the non-quality of the model built.
Antoine: Are ethics becoming an element incorporated in training, university, or even in more targeted programs?
Not yet. In engineering schools, I work mainly on neural networks and activation functions. In business school, we talk about ethics indirectly; this raises many questions in the implementation of artificial intelligence. AI is full of fantasy, especially for people far from their actual deployment. There is a debate around AI between major figures of Digital. The theme therefore begins to arrive. It is not easy to handle; it joins philosophical points, subjectivity, morals.
Antoine: Data architectures evolve, from Data warehouse, Data Lake to Data Mesh in order to accelerate the flow of data and decision-making at scale. What impacts do you identify for Data Quality?
Data Mesh is indeed a good response to a transformation into Data-Driven organizations. It is a way of getting the professions to take more ownership of the subject. We talk about it more and more. I see Data Quality among the pillars allowing access to this type of solutions and uses, such as DataOps, architecture, data administration, data provision processes. When a report is for example designed in one direction, one has to ensure reliability, validity and relevance. The risk is to make bad decisions.
Data Quality is therefore fundamental. We must ensure that the data has not been altered in transit between directions, that we have the same definition to maximize its usefulness. This requires homogeneity of practices at the level of the organization with standards and shared requirements of Data. All these elements are prerequisites for Data Mesh, a real accelerator for deploying the uses of data throughout the organization.
The objective in Data-Driven remains to maximize the creation of value by naturally using data in the management of processes. You have to know how to work with data, understand it to use it. As decision-making becomes increasingly supported by data, the need for traceability and explainability increase accordingly. The responsibility of the actors is involved in deciding whether or not to follow a recommendation. The adoption of AI requires management in companies.
Antoine: Let’s share on the implementation of Data Quality. We see solutions emerging like DataPrep in GCP, with equivalents in the competition to support data quality. Couldn’t a product approach for the business speed up the process?
Yes, this is a very good point. I am convinced that there are several subjects to set up and in different places to ensure Data Quality. DataOps partially addresses data monitoring, and Data Quality provides part of the correction.
Moreover, we are working with you on devices and products to ensure the quality of this data circulating in the company, with an owner responsible for Data Quality with its users.
Antoine: We are therefore far from one-size-fits-all solutions. We must address the subject in its entirety in the various dimensions of processes, organization and skills. Proper change management is necessary.
It is indeed a transformation project. Moreover, operational staff rarely see the value of Data Quality. They do not realize the negative impact for the company that can happen without addressing the issue. It is dangerous, we can have a report or metrics available without it being reliable. Since people are not stuck in activities, it is wrongly considered secondary. Teams can fall from the top by applying Data Quality on historical reports.
Antoine: We have seen the emergence of Open Data. We are also seeing Cloud interoperability standards and models emerge. Do you identify opportunities for improvements in the ecosystem to accelerate data sharing?
Open Data is a relevant topic that I have been interested in and will get back to it more seriously. In my experience, the datasets took a lot of effort to clean up and transform. The data.gouv portal, for example, provides interesting public data but is not necessarily qualified or standard.
“The quality of data within the ecosystem is a real issue. There are many opportunities for improvement to promote innovation and interoperability.”Samir Amellal
From a business point of view, I think there are real sectoral issues. For example, there are no models or standardized standards for the Retail sector. It is therefore difficult to share products, nomenclatures and taxonomies. An effort of integration and adaptation is necessary, even within the same group. This is a significant waste of effort for a large number of organizations. These topics require vertical and sectorial improvements initially.
Antoine: Yes, moreover, the composition and interoperability of systems is a significant issue for the continuous delivery of value; flexibility is key. A path has been made on technical protocols with standards, but functional standards are still underdeveloped.
Completely, standardization and interoperability are reinforced and can multiply the results tenfold. Beyond companies and their individual Data Quality initiatives, I think we have real issues to address in this regard. Organizations like W3C are doing this work in the web ecosystem.
Antoine: To end on a personal note, do you have any content that has and continues to inspire you? It could be people, quotes, books, or whatever.
In the field of IT and Data, I have a friend, Luc Julia, who inspires me a lot. He started doing IT in the USA in different environments. I highly recommend his book Artificial Intelligence Does Not Exist. I can also recommend a book by Gilles Berton in whom I was interviewed, “CDO”. We see that we all have the same issues with different issues and contexts.
Apart from these fields, I was very interested in econometrics and the philosophy of science, epistemology. This covers the definition and framing of a problem. Knowing how to correctly formulate a problem is more than useful in Data Science. Our models derive strongly from these practices. David Yom is a philosopher epistemologist of empiricism and Karl Popper are very inspiring to me. Their practices are fundamental to modeling and understanding problems and solving them with truly relevant solutions.
Antoine: Thank you Samir for this sharing on Data Quality. A good continuation in all your initiatives and activities. You can follow Samir Amellal here.
Forrester (2019), Why Marketers Cannot Ignore Data Quality. Report.
Luc Julia (2019), Artificial intelligence does not exist. First editions.
Standard, David Hume’s Biography https://plato.stanford.edu/entries/hume/
Stanford, Karl Popper’s Biography https://plato.stanford.edu/entries/popper/