Ⅰ. Introduction
Big data rehabilitation research in the United States (U.S.) has grown tremendously over the past few decades due to systematic support from Congress and federal funding agencies. For instance, in 2009, the U.S. Congress passed the American Recovery and Reinvestment Act of 2009 (ARRA) which committed 19 billion dollars to the implementation of Electronic Health Records (EHRs) across providers(American Recovery and Reinvestment Act, 2009). To improve healthcare, ARRA allowed the Centers for Medicare & Medicaid Services (CMS) to provide reimbursement incentives for providers who use certified EHR technology (Centers for Medicare and Medicaid Services, 2011). Based on the national priority, providers who sought incentives from the CMS have built electronic medical systems and collected EHRs for Medicare and Medicaid patients. For instance, in 2015, 96% of private acute hospitals adopted a certified inpatient EHR system in the U.S. (Henry, Pylypchuk, Searcy, & Patel, 2016). Despite the fragmented healthcare system in the U.S., these digitized patient healthcare records have improved communication and coordination of care and accelerated big data research in rehabilitation(Weil, 2014).
In addition, the All of Us Research Program funded by National Institutes of Health (NIH) aims to collect a minimum of 1 million individuals’ comprehensive medical records to advance disease diagnosis and prevention in order to improve health. As of July 2019, over 175,000 participants’ biospecimens and 112,000 participants’ EHR data have been collected (Denny et al., 2019). These data allow researchers to develop treatment interventions tailored to individuals, and data science is a key component of this initiative. According to the NIH strategic plan, data science holds significant potential for accelerating the pace of biomedical research (National Institutes of Health, 2018).
The Korean healthcare system has been operating with a governmental universal payment model since 1989. In 2008, 96.3% of the Korean population across all age groups had uniform health insurance, and the number of beneficiaries increased to 97.2% in 2017(Korean Statistical Information Service, 2017). The majority of Korean healthcare providers use the same EHR system (e.g., National Health Insurance Service). In addition, the Centers for Disease Control and Prevention Korea (KCDC) supports the Korean genome and epidemiology study (KoGES) and other national surveys which are available free or a small administration fee(Kim, Han, & Ko, 2017). Despite this singular healthcare data source, few studies using big data analytics have been conducted in Korea. We speculate that limited cases of big data research in rehabilitation could be due to insufficient workforce training opportunities for rehabilitation researchers.
Data science specifically within the realm of occupational therapy can enhance patient-centered care and intervention planning to improve individuals’ functional abilities, social participation, and ultimately quality of life. Occupational therapy students, clinicians, and researchers have the opportunity to impact the scientific community not only in their country but globally through the utilization of data science and promotion of open sharing and accessibility. Thus, the aims of this paper are to 1) explain the role of occupational therapy within data science, 2) introduce current trends of data science in rehabilitation areas in the U.S., and 3) explore potential research opportunities utilizing Korean databases.
Ⅱ. Big Data Comparisons between Korea and the United States
A common misconception in big data research is the belief there are few rehabilitation areas in Korea for big data analytics compared to the U.S. The majority of rehabilitation-related big data studies use the U.S. Medicare data. However, the Medicare data only include Medicare eligible older adults aged 65 years and older and people with disability which account for only 18.2% of the U.S. population in 2018(Kaiser Family Foundation, 2018). In addition, researchers have to pay large administration fees to access these claims data. In contrast, the number of Korean National Health Insurance Service beneficiaries accounts for 96.3% of the Korean population across all age groups (Korean Statistical Information Service, 2017). Certain national level data such as proportions of disease types, frequency of medical treatments, and total payment can be freely requested or accessed through a government public data portal (http://data.go.kr).
Table 1 presents currently available rehabilitationrelated big data in Korean and the U.S. The Korean government has launched various population-based studies by benchmarking existing U.S. national studies. For instance, the Korea National Health and Nutrition Examination Survey (KNHANES) is similar to the U.S. National Health and Nutrition Examination Survey (NHANES). Findings from population-based studies have impacted U.S. health policies. For example, the U.S. government has utilized study findings from the NHANES in annual government reports(Centers for Disease Control Prevention, 2009). Similarly, the KNHANES has generated several research studies. In 2014, there were 654 peer-reviewed manuscripts from the KNHANES; however, only two studies were conducted to answer rehabilitation-related questions (Kweon et al., 2014). We expect that if Korean occupational therapist scientists are informed about currently available big data in Korea and research trends in the U.S. studies, they would be able to conduct a similar level of big data research using well-organized rehabilitation-related big data. Currently, Korean rehabilitation related big data are available at the Korean Health and Welfare Data portal websites (https://data.kihasa.re.kr/index.jsp) and various U.S. big data are accessible at the Archive of Data on Disability to Enable Policy and research website (https://www.icpsr.umich.edu/icp srweb/content/addep/index.html).
Ⅲ. Research Areas
Functional Outcomes. The Improving Medicare Post-Acute Care Transformation Act of 2014 provided support to further expand big data availability in the U.S.(Improving Medicare Post- Acute Transformation Act, 2014). This act requires post-acute inpatient rehabilitation facilities, skilled nursing facilities, long-term care hospitals, and home health agencies to report standardized patient assessment data elements and quality measures. Traditionally, patients’ functional outcomes like the Functional Independence MeasureTM scores were critical components of estimating rehabilitation payments by the CMS(Centers for Medicare and Medicaid Services, 2013). Recently, the U.S. healthcare system has transitioned to quality of care measures, placing emphases on areas such as all-cause hospital readmission and community discharge. Big data research in rehabilitation focuses on identifying and enhancing effective functional outcome components to improve quality of care like hospital readmissions(Graham et al., 2017). Currently, the Korean payment model is a fee-for-service model, known as an “uncontrollable” payment model , which was the previous payment model in the U.S. The fee-for-service payment model in Korea means that healthcare providers receive larger reimbursement when providing more rehabilitation services(World Health Organization, 2010). Since the National Health Insurance Service data can provide the amount of payment and resource use, such as therapy procedures, potential big data research can include studies that examine the variation between resource use and the amount of payment(Cha, Song, Kim, Kim, & Kim, 2017; Cho et al., 2018; Cho & Yang, 2016). The findings from this type of research can inform Korean policymakers on critical issues in current and future payment models. Furthermore, these findings could lead to a new payment model focused on improving or at least maintaining the quality of care while decreasing the costs of care.
Geographical Variation. Medical ethics can be explained by four basic principles: 1) respect for autonomy, 2) beneficence, 3) non-maleficence, and 4) justice(Gillon, 2003). Among the four principles, justice stands out as a key concept in the area of health disparities research. Justice is described as fairness or equality. In the context of healthcare, this principle means that healthcare should be available regardless of age, sex, race/ ethnicity, religion, or socioeconomic status(Gillon, 2003). Individuals should receive healthcare equally and have equal access to healthcare resources. However, previous big data studies have reported significant variation in rehabilitation outcomes across facility characteristics and geographical regions in the U.S.(Middleton, Graham, Prvu Bettger, Haas, & Ottenbacher, 2018; Reistetter et al., 2014; Reistetter et al., 2015). Variation studies with national data can promote ethics-related discussions and identify areas for improving care. Maximizing the satisfaction and equality of care and service use with reduced healthcare spending are two examples of possible improvements(Reistetter et al., 2014). Ideally, medical facilities should be geographically located throughout the country to provide equal access. Like other countries, Korean healthcare providers are mostly concentrated in highly populated metropolitan areas, like Seoul or Busan. For this reason, variation studies in rehabilitation outcomes for Korea need to evaluate how patients’ outcomes vary by location such as metropolitan, suburban, and rural areas as well as by facility characteristics such as the size, specialty type, and hospital classification. Policy makers and healthcare providers can use this information to improve the quality of care and consistency of rehabilitation services.
Linkage of a Population-Based Cohort to Hospital Data. Big data research in rehabilitation areas in the U.S. typically uses claims data, such as Medicare, Medicaid, and Uniform Data System, hospital data such as EHRs, or population-based national surveys. Each type of big data has a different scope of an individual’s information. For instance, claims and hospital data collect comprehensive patent level medical-related information such as hospitalizations, health care costs, chronic conditions, medications, etc. However, medical records often do not collect social-contextual information because the primary purpose of the data is for billing. National survey data collects various lifestyle-related data such as health behaviors, dietary information, family structures, income, genetics, etc. which are critical components when interpreting the aging or disabling processes. However, these surveys do not collect precise individual-level medical conditions. For these reasons, researchers have linked claims or hospital data to national survey data to better understand how social-contextual factors are intercorrelated with medical conditions(Ottenbacher, 2016). For instance, Medicare data can be linked with national surveys, such as the Hispanic Established Populations for the Epidemiologic Study of the Elderly (EPESE), Health and Retirement Study (HRS), and National Health and Aging Trends Study (NHATS). Those data linkages use unique identifiers or probabilistic data matching methods to connect datasets.
The Korean National Health Insurance Service data contain comprehensive patient-level claims data which can be linked to various national surveys because these data sources have exclusively been managed by the Korean government and KCDC. These comprehensive data sources, including medical, social, behavioral, and genetic information, can be used to accelerate precision medicine in rehabilitation.
Cross-national Comparison Studies. While various national surveys have been conducted to estimate the level of disability or health status in each country, often these estimated disability levels cannot be compared across countries because of differing definitions and survey items. Due to the lack of harmonized international survey data, the Organization for Economic Cooperation and Development (OECD) has relied on a single activities of daily living (ADL) question for estimating disability across countries(OECD, 2015). However, this single question has been criticized as a crude disability estimation due to increased measurement error (Hong, Simpson, Simpson, Brotherton, & Velozo, 2018). Recently, about 30 national aging surveys have been harmonized by efforts from the World Health Organization Study on global AGEing and adult health (SAGE) and U.S. government (Minicuci, Naidoo, Chatterji, & Kowal, 2016; Shih, Jinkook, & Lopamudra, 2012). The harmonized national surveys contain similar sets of function related items such as ADLs, instrumental ADLs, cognition, and social-contextual items. The harmonized aging surveys have allowed researchers to conduct cross-national comparisons studies(Cieza et al., 2015; Hong, Reistetter, Díaz-Venegas, Michaels-Obregon, & Wong, 2018). For instance, Cieza et al.(2015) compared the general health status between the English and American adult populations using the harmonized function related survey items. Similarly, Hong et al. (2018) compared the functional status between the U.S. and Mexico using the harmonized survey data and reported that American adults were less functional than Mexican adults. In Korea, the Korean Longitudinal Study of Aging (KLoSA) has been harmonized, which allows Korean researchers to conduct cross-national comparison studies (Boo & Chang, 2006). Several Asian aging studies have also been harmonized including the China Health and Retirement Longitudinal Survey (CHARLS), Japanese Study of Aging and Retirement (JSTAR) and Longitudinal Aging Study in India (LASI). These harmonized surveys will allow Korean researchers to conduct cross-national studies which explore rehabilitation and patterns of disability progression, psychosocial factors and health status, family structures, and quality for life, as well as other determinants of health.
Scale Development. Traditionally, measurement science is a critical part of rehabilitation because valid outcome measurement is needed to capture and understand treatment effects. Thus, many large data studies have examined the psychometric properties of rehabilitation related instruments. Most of these studies collect cross-sectional assessment data and determine convergent or divergent validity across measures. Few studies have been longitudinal such as those studies seeking to establish test-retest reliability due to intensive demand of time for data collection. This issue can be resolved by utilizing existing data. Typically, archived randomized controlled trials (RCTs) include various rehabilitation related assessment data with multiple time-points (e.g., pre- and post-test, and follow-ups). These data allow researchers to test various psychometric properties of rehabilitation related outcome measures, such as basic reliability (e.g., internal consistency, item-total correlation, test-retest reliability) and validity (e.g., factor structures, convergent or divergent validity), responsiveness, and minimal detectable change. In the case of small sample sizes in the archived data, modern measurement models (e.g., Rasch analysis) can test item-level psychometric properties such as fit statistics, item difficulty hierarchy, rating scale analysis, differential item functioning, sample-item match, and keyform(Linacre, 1994; Wright & Stone, 1979). Additionally, a set of questions in national surveys can be examined to develop a patient-reported outcome measure. Since the study sample in national surveys is based on the census sampling methods, the development of a patient-reported outcome measure can serve as a valid instrument for population estimations. In short, secondary data analysis using archived data can reduce additional research efforts to validate rehabilitation related outcome measures. However, this advantage solely depends on the availability of existing assessment data for targeted study populations.
Ⅳ. Statistical Methods in Big Data Research
The advantage of big data research is that researchers can explore hidden relationships among variables. For instance, the U.S. Medicare data includes hundreds of patient-level variables and numerous studies have reported how patient and clinical characteristics were associated with health outcomes(Graham et al., 2017; Howrey, Graham, Pappadis, Granger, & Ottenbacher, 2017; Reistetter et al., 2015). The most common statistical method is a regression model (general linear model and generalized linear model) that accounts for various covariates. However, a unique feature of this type of research is that patient-level characteristics are nested within facilities or regional areas. In this case, multilevel modeling such as hierarchical generalized linear mixed models and hierarchical general linear mixed models needs to account for the variation in the upper-level factors (e.g., facilities and regions) to accurately estimate the target outcomes(Goldstein, Browne, & Rasbash, 2002; Reistetter et al., 2015). In addition, path analysis, structural equation modeling for including latent variables, is a powerful statistical model to examine direct and indirect relationships among the independent variables and target outcomes (Kline, 2015).
Big data from real world settings (e.g., EHR) pose unique analytical challenges. Selection bias is a fundamental problem in observational research as patients are not randomly assigned to treatment groups(Rosenbaum, 2002; Rosenbaum & Rubin, 1983). Recently, propensity score matching methods have been introduced in big data research to account for selection bias. This technique allows researchers to compare the effect of a specific exposure on the target outcomes. However, this approach alone does not account for confounders not included within a dataset. To address this issue, big data studies often use instrumental variable analysis to estimate the effect of unmeasured confounders (Frogner, Harwood, Andrilla, Schwartz, & Pines, 2018; Kuo, Chen, Baillargeon, Raji, & Goodwin, 2015). This suggests that an instrument variable analysis can approximate causal relationships in observational studies. However, a recent study revealed that when there is a comprehensive set of variables, the analysis results are consistent for ordinary least squares regression, propensity score matching, and instrumental variable analysis (Reistetter et al., 2019). In short, researchers are able to use big data to conduct comparative effectiveness studies examining the effect of different health behaviors on health status using adequate statistical methods and study variables (Hong, Aaron, Li, & Simpson, 2017).
Ⅴ. Applications in Occupational Therapy
Comprehensive content knowledge is a critical component in big data research. Occupational therapists are specialized health professionals primarily focused on maximizing patients’ physical or cognitive recovery. Based on holistic patient care perspectives, occupational therapy scientists could conduct a series of big data studies, such as 1) comparative effectiveness research across patient care settings, 2) health disparity studies across minority groups or geographical locations, 3) quality of life or patient satisfaction research in post-acute care settings, 4) health services research for current and developing Korean health policies (e.g., therapy intensity, minutes, or resource use), or 5) scale development studies using secondary data. In addition, occupational therapy scientists enhance other health professionals (e.g., medical doctors, nurses, social workers) understanding of patientcentered care with their unique perspectives on functional performance and community engagement. These involvements in big data research would help other health professionals to apply their findings to patients’ real-life issues.
Ⅵ. Workforce Development in Big Data Research
Rehabilitation researchers are traditionally trained in clinical research designs, such as RCTs or single-subject trials. Big data research requires a different skill set. For example, clinical trials focus on fair randomizations and comparisons, human protections, and power or sample size analyses. In contrast, big data research requires data management and mining, complex statistical models, and data interpretation. Because of the different research skills between big data research and clinical trial research, workforce development in this emerging research area is needed. A good example of workforce development in big data research is the Rehabilitation Research Career Development Program (RRCD) supported by the NIH which includes a partnership between the University of Texas Medical Branch (UTMB), the University of Florida, and the University of Southern California (Rehabilitation Research Career Development Program, 2019). Since 2007, the RRCD program has provided extensive rehabilitation research training to occupational and physical therapists. UTMB emphasizes big data rehabilitation research through the Center for Large Data Research & Data Sharing in Rehabilitation. Supportive infrastructure for occupational therapy scientists is an indispensable component for successful workforce development in this emerging field. Doctoral programs in rehabilitation sciences or occupational therapy need to include rigorous research methods and advanced statistics or data management techniques to prepare future big data research scientists. In addition, a rigorous research infrastructure would need a governmental (e.g., research priorities or grants) and institutional resources to maintain and develop the rehabilitation workforce to become leaders in big data research.
Ⅶ. Conclusion
Advances in informational technology enabled the era of big data research in rehabilitation. Many academic fields in Korea have actively engaged in big data research. However, rehabilitation health professionals, including occupational therapy, have conducted relatively few studies with big data. This may be due to 1) unsupportive Korean governmental policies, 2) the current curriculum in rehabilitation doctoral programs, and/or 3) less exposure to this type of research among educators, researchers, and clinicians. Since big data research provides rigorous research opportunities and can impact healthcare policies in Korea, it is recommended that rehabilitation professionals focus on developing a workforce for the emerging big data science and be aware of the research trends to advance the quality of big data research in Korea.