How can 40 trillion GB of medical data help new drug research and development?

Release time:



In May of this year, the Center for Drug Evaluation (CDE) of the National Medical Products Administration issued the "Basic Considerations for Real-World Evidence to Support Drug Research and Development" (draft opinion), which means that "how to incorporate real-world evidence (RWE) into my country's drug research and development and In regulatory decision-making" has begun to become a formal problem for regulators to think about and start to promote solutions.

For a while, the industry was hotly debated: "What clinical problems can RWE solve? What data should RWS integrate? Is the RWD collection and processing process traceable? What is the methodology of its data analysis?"......

To answer these "soul torture" from supervision to landing, and to make the real world evidence really "for my use", the first thing to solve is the problem of medical database.

In response, on July 29, at the 2019 academic annual meeting of China biostatistics, Zhang tianze, founder and CEO of zero krypton technology (LinkDoc), a medical big data and artificial intelligence enterprise, gave the thinking and experience sharing from the front-line explorers and practitioners of the industry on the topic of "the current situation and challenges of China's medical database.

Industry demand is the real driving force

"In the past period of time, there are several forces driving the development of medical databases."

The first is "technology driven". IDC Digital predicts that by 2020, the amount of medical data will reach 40 trillion GB, which is 30 times that of 2010. In the ten years of the Chinese New Year, medical informatization has ushered in great development, and a large amount of "raw materials" have been deposited for mankind, "just like unsmelted crude oil", and the next stage of medical digitization is based on massive data informatization., The key and the only way to output service value.

The second is "policy promotion". In 2016, the General Office of the State Council issued the Guiding Opinions on Promoting and Regulating the Application and Development of Big Data in Health Care, in 2017, the Opinions on Promoting the Development of Internet Health Care, and in 2018, the National Health and Care Commission issued the Measures for the Management of National Health and Medical Big Data Standards, Safety and Services (for Trial Implementation)... After a brief review, in about three years, the state has issued a total of 30 policy documents, the health care big data will be incorporated into the national big data strategic layout to accelerate the sharing and utilization of medical data resources in China.

Finally, and most importantly, is "demand traction".

Drug R & D and evaluation, pharmaceutical marketing and distribution, commercial health insurance, auxiliary diagnosis and treatment, genetic data analysis, medical professional continuing education, clinical research services, drug supervision, public health management ...... In all areas of medical care, data can play a huge value. "Medical data platforms and resources are at the hub of the new medical industry, and are of great significance in the era of individualized and precise diagnosis and treatment."

Under the traction of this strong demand, some specific functional databases and specific population databases have been born in China, such as medical literature database, biological information database, clinical medical database, insurance payment database, etc.

"But the demand and traction of the industry is actually the real power."

Strong traction on the demand for drug research for major diseases

Today, especially in the field of pharmaceutical research and development, this traction force is pulling more and more intense.

Data show that the time for phase I to phase III trials of tumor drugs is 9.6 years, and it takes 10.5 years from the first patent application to the market. In 2018, the average cost of each new tumor drug reached 2.6 billion US dollars, and the research and development success rate dropped to 8.0 percent.

"With the advent of the era of precision therapy, clinical trials have become more difficult, patient enrollment has become more difficult, the time and financial costs of drug development have increased significantly, and the patient population for which drugs are applicable is very segmented and discrete, making clinical diagnosis and treatment more difficult and patient management more difficult. In addition, competition for indication development and commercial competition are becoming two parallel main battlefields that directly affect the potential for commercial development in the medium to long term."

It can be said that the rapid increase in demand for major disease drug research has brought about a huge demand for specialized disease databases.

In the United States, ASCO has advocated and promoted the development of multiple tumor RWD data platforms in North America, such as CancerLinQ and Flatiron. International pharmaceutical giant Roche has acquired? Foundation and Flatrion for $2.1 billion and $2.4 billion, respectively, to advance the use of real-world data in pharmaceutical research and development. You can see it there.

China has also been actively exploring the establishment of scientific research-level medical databases. For example, the China Cohort Sharing Platform (China Cohort Consortium), but its openness is low; the National Cancer Registry (National Central Cancer Registry), its data are published in the form of reports every year, and it is not entirely established with the goal of clinical research or pharmacoeconomic research.

On the whole, China's database generally lacks patient-centered panoramic and long-term data. The special disease cohort that can be applied to drug research and development requires a very high degree of integrity of the original medical record, which includes not only in-hospital data, but also out-of-hospital data. In-hospital data also includes in-hospital information system data and department precipitation data; out-of-hospital data includes out-of-hospital prescription data and follow-up data. Taking the original medical records of NSCLC patients as an example, it is necessary to include the whole process data such as diagnosis and admission, surgery, postoperative adjuvant therapy, recurrence and metastasis, gene detection, 1 ~ n line treatment, survival data, etc.

"The process of treating a patient is like a child eating bread. When the crumbs fall all over the ground, it is necessary to track and collect the crumbs all the way. Only in this way can high-quality scientific research data be generated."

The gap between expectation and reality

So, when you really pick up pieces of "crumbs" and splice them completely, the value is huge.

At the World Conference on Lung Cancer 2018, a multi-center, non-interventional retrospective cohort study at Brest University Hospital, France, evaluated the intracranial effectiveness of nivolumab in the treatment of advanced NSCLC with brain metastases in the real world. The results of the study show that immunotherapy shows a very promising effect in NSCLC patients with brain metastases.

On April 4, 2019, Pfizer's new breast cancer drug Ibrance was approved for male breast cancer based on real-world data, shaking the entire pharmaceutical industry.

Seeing the potential of real-world data in the field of medical research and development, in May 2019, CDE released "Basic Considerations for Real-world Evidence to Support Drug Research and Development" (draft for comments), which determined the application of RWE in scenarios such as rare disease treatment drugs, revision of indications or scope of combined drugs, re-evaluation of drugs after marketing, clinical research and development of traditional Chinese medicine hospital preparations, guidance of clinical research design, and accurate positioning of target population. It triggered a heated discussion in the industry.

Many people are looking forward to real-world data to solve the needs and pain points of the industry.

"However... there is always some gap between expectation and reality."

Ideally, applying real-world data goes straight to deep learning and artificial intelligence through data extraction and model building. The reality is that the application of real-world data requires crossing a gully, including demand discussion, data extraction, data cleaning, missing value processing, feature engineering, model evaluation......

Over the gully

How can we get past the gullies of real-world data applications?

"To create a real, credible, and usable real-world clinical database, it is necessary to solve the five major problems of large medical records, unstructured, difficult follow-up, no industry standards, and safety."

The establishment of a disease model is the first, "to set a universal disease model for each disease, a basic model has different domains, each domain has different variables, each variable to design the relevant constraints."

Next, the combination of artificial and artificial intelligence can be used to deeply structure the massive data, which can increase the data processing speed several times, dozens of times or even a hundred times, greatly reduce the data processing cost, and ensure the quality. In the extremely important process of follow-up data integration, for example, zero krypton (LinkDoc) assigns follow-up tasks based on algorithms, the follow-up data and clinical data are connected in real time, the patient's personal information is desensitized, privacy is fully protected, 100 recording of the whole process is realized, and the success rate of follow-up is 80% +.

"Structuring and following up the data is only the starting point." The next step is to establish a data processing process based on the international clinical research CDISC standard, and finally make the real-world original medical data into a standard database for scientific research.

At the same time, in the application process of the database, it is necessary to strictly abide by the standardized path of "physical isolation of data, access control, hierarchical management of application data, and informed authorization of patients.

Based on such a real-world database, the value of the data can be fully realized.

High-quality databases can be applied to large-scale clinical studies to help develop new cancer drugs in China; innovative models can help innovative drugs carry out key drug monitoring; establish a multi-center data platform to help experts publish high-quality academic articles in top academic journals, and so on.

Finally, Zhang Tianze believes that the real world database is based on clinical medical records, but it is far more than clinical medical records. "A true, credible, and usable clinical database requires clear goals, overall design, and solid quality control. All three are indispensable." Perhaps only in this way can we answer those "soul torture", let the real world evidence really be used for clinical diagnosis and treatment, drug research and development and pharmaceutical industry, and let the real world research really fall to the ground.