Current AI technology essentially obtains data by measuring the real world, extracts algorithmic models from the data, and uses the models to make relevant predictions. Therefore, data and algorithms are the basis of AI computing and decision-making. The utilization rate of big data in healthcare is low. Although the data in hospitals are enormous, most of them are unstructured data, which cannot bring out the value of “big data”. Many hospitals have not yet established a unified data management system, which is not conducive to the unified analysis of data and impacts the application of AI technology in the medical field. Many countries have incorporated quality management of training data and data trainers into their regulatory frameworks to ensure data quality. For example, China’s
Deep Learning Assisted Decision-Making Medical Device Software Approval Points [
81] requires quality control of training data, and should ensure diversity of data sources, with data collected from multiple medical institutions at different geographic and hierarchical levels whenever possible. The
Approval points further subdivide the data sets into training sets (for algorithm training), validation set (for algorithm hyperparameter tuning), and testing set (for algorithm performance evaluation), etc., and specifies different acquisition requirements. It also provides requirements for the access qualification, selection, training, and assessment of data trainers.
Second, on the sharing of health care data. The main obstacle to data sharing is the ownership of data. There are several views of data ownership in academic circles: ownership by individuals, ownership by organizations such as enterprises, ownership by the state, and ownership by all human beings. The debate around ownership does not only include questions as to who owns data, but also whether there should be a notion of ownership. Macnish and Gauttier [
82] argue that it's not appropriate to talk about our relationship with data in terms of ownership. There are only weak philosophical grounds on giving citizens control of ‘their’ data. Control should be based around custody of data and the potential for harm. Healthcare data are sensitive information about a person [
23,
83] which is also related to personal privacy. Respect for personal privacy is a crucial ethical principle in health care because privacy is linked to personal identity and autonomy [
84]. For these reasons, proper procedures to ensure that genuine informed consent is obtained from patients regarding the use of their personal health data are essential. For example, patients must give explicit consent for their health data to be used for any specific purpose [
85]. In 2018, the EU introduced the first bill on personal data privacy protection-
General Data Protection Regulation (GDPR) [
86]. Unlike previous industry regulations, this is a truly enforceable law with specific and strict requirements. For example, operators are required to allow users to express a desire for personal data to be “forgotten”, i.e., “I don’t want you to remember my past data and I want you not to use my data for modeling purposes from now on”. At the same time, the consequences of violating GDPR are severe, and fines can be as high as 4% of the global revenue of the fined organization. In practice, however, if software development organizations were to require patient consent for each use of aggregated data, it would inevitably increase the cost of data use. Manson and O’Neill [
87] argue that more specific consent is not always ethically better and is difficult to achieve in practice. Consent requires unique communicative transactions. Through these communicative transactions, other obligations, prohibitions, and rights can be waived or set aside in a controlled or specific manner. Some scholars proposed more lenient forms of informed consent, such as broad consent and blanket consent, to facilitate practical implementation [
88,
89]. However, the moral rationality for these informed consents remains controversial. Regarding the sharing of health care data, some believe that patients have an obligation to contribute to improving the quality of the health care system [
90]. Patients’ clinical data have potential medical value and should be widely shared to promote the health and well-being of all humans. From the perspective of human benefits, it is also unethical not to use existing clinical data to develop tools to benefit all humanity [
91]. In the author's view, health data should be applied rationally in the public interest while protecting patient privacy and data security. De-identification and anonymization can be used to protect patient privacy in data collection and storage. De-identification is the process of making it impossible to identify the subject’s personal information without the help of additional information by appropriate processing. For example, the identity information is represented by one-to-one unrelated code names, the AI software developers have access only to the code names, and the database owner holds the key to associate the code names with the identity. At the same time, the decoding must be stipulated accordingly. The anonymization process means that the personal identifiers in the data are completely removed and there is no connection between the data provider and its data. Anonymous data means that it cannot be used to identify a person and is therefore not subject to the GDPR rules, which means that if a company collects anonymous data, it does not need to obtain the consent of the users. Technologists also use differential privacy to create a barrier between hackers and data to prevent data from being restored after a breach [
92]. We believe that it is ethical to dispense with re-informed consent for data use under conditions that ensure data security and do not compromise patient privacy, as long as a sound ethical review system is in place. If possible, the government should establish a corresponding website or query platform to facilitate patients to track their medical data usage status. A balance needs to be found between the two extremes: prohibit data flow for personal interests and pursue data sharing by putting public interests above personal interests. While ensuring medical data security, data sharing and research should reasonably be promoted to enhance human welfare, which is also the ethical and legal goal. On the premise of personal information protection, accessible data flow and strengthening international cooperation should be promoted through the United Nations, G20 and other global platforms to achieve sustainable development of AI.