Full Report
Understanding the data we collect is essential—it allows us to identify trends and uncover answers about our world. However, stories in our data frequently go untold. Large datasets are hard to share between research communities due to their size, security restraints, and complexity. Even if these datasets are accessible to users, the tools needed to query them often require deep technical knowledge. This is why Redivis partnered with Google Cloud to help make research data from higher education institutions easier to analyze and more accessible. Redivis’s mission is to create a frictionless “data commons”—a place where researchers can discover, request access to, and query large datasets to support their studies. To make this goal possible, Redivis began to rethink the traditional data-distribution process.Challenges to making data more accessibleWhen Redivis first started, their team interviewed dozens of researchers to understand their biggest problems. Most researchers expressed how difficult it is to find new datasets, and how many steps it takes to access and work with the data—often before knowing if the information the dataset contains is even useful for their study. Additionally, data administrators want their datasets to be utilized but are often concerned about data security.Storing large amounts of sensitive data requires the right set of security controls. To help keep their data secure, Redivis developed a transparent, tiered access system for datasets. Researchers can request separate access to a dataset’s documentation, variables, sample, and full data, which allows them to assess the usability of the dataset without filing access applications. Moreover, administrators can set rules for how researchers use and combine different datasets depending on their level of access. Redivis built their platform on top of Google Cloud’s security infrastructure, which allows the company to encrypt data, manage security keys, and helps secure datasets with the operational and physical security layers available. Combined with detailed audit logs (supported by Google Cloud Logging) and robust application-level security controls, Redivis is able to provide data owners with the peace of mind that their data is only being accessed and used as they’ve allowed.Sharing data to build more compelling storiesWhen we join multiple sources of data, we can uncover a more complete story, such as in the case of examining environmental conditions. By combining data about historic fires, air quality data, and population health outcomes, researchers are able to offer policy guidance to protect the most at-risk populations. However, if the datasets stayed separate, we would likely lose insight into the impact these events have on each other. With the help of cloud solutions like Cloud Storage and BigQuery, Redivis figured out ways to securely connect the data between public datasets hosted in Big Query with private datasets to unlock enriched insights for their researchers. Using Cloud Storage, Redivis makes it easy for administrators to upload large amounts of data to the platform. These data records are then stored in BigQuery, Google Cloud’s serverless and scalable data warehouse. When researchers explore their data with Redivis, they can easily see what steps they need to take to request access to existing records. Once authorized, users can query the data using SQL, without needing to know database languages. This will provide the user with manageable data subsets that can be analyzed within the context of their current study. Finally, researchers can integrate a wide array of analytical tools into this data pipeline. Using BigQuery’s ability to one-click export data to Google’s Data Studio, Redivis is able to create interactive data visualizations and integrate with notebook environments through Python and R clients.With BigQuery managing infrastructure requirements, Redivis scaled to petabytes of data, 1,000 times larger than the terabytes they had previously, without additional infrastructure workloads straining their company. Most importantly, BigQuery’s compute architecture supports real-time analysis across billions of records from both public and restricted datasets, unlocking new ways to discover insights. “Researchers are regularly coming to me to say that queries that once took hours are executing in seconds,” says Ian Mathews, CEO of Redivis. “One can only imagine how transformative this is in understanding new datasets and exploring novel hypotheses.” The future of data accessibilityAs more academic institutions and researchers join Redivis, they will continue to identify ways of minimizing friction at every step of the data-driven research process. To learn more about the steps Redivis is taking to make data more accessible and empower researchers, check out this video. And to learn more about BigQuery, visit our website. Related Article Accelerating Mayo Clinic’s data platform with BigQuery and Variant Transforms See how Mayo Clinic uses Google Cloud to work with genomic variant data for research purposes. Cloud data warehouse BigQuery lets them sa... Read Article
Analysis Summary
# Industry News: Redivis Taps Google Cloud to Democratize Access to Large, Secure Research Datasets
## Summary
Redivis has partnered with Google Cloud to address significant friction points in academic research data accessibility, focusing on the challenges of large data volume, security concerns, and complexity barriers. By leveraging Google Cloud's BigQuery and security infrastructure, Redivis aims to create a 'data commons' where researchers can securely discover, access, and query massive datasets without deep technical expertise.
## Key Details
- Date: October 14, 2020 (Publication Date)
- Companies Involved: Redivis, Google Cloud
- Category: Partnership, Product Integration/Enhancement
## The Story
The traditional process of accessing and analyzing large research datasets in higher education is characterized by difficulty in discovery, slow access procedures, and high technical requirements for querying. Data administrators are often hesitant due to security risks associated with sensitive data.
Redivis is tackling this by creating a frictionless data commons platform built on Google Cloud’s infrastructure. Key innovations include:
1. **Tiered Access System:** Allows researchers to sample data documentation, variables, and subsets before requesting full access, reassuring administrators about usage control and security.
2. **Security Foundation:** Utilizes Google Cloud’s security layers, including encryption, key management, and audit logging (via Google Cloud Logging), to provide data owners peace of mind.
3. **Scalable Query Engine:** Uses Google Cloud Storage for uploads and BigQuery for scalable data warehousing. This transformation allowed Redivis to scale from terabytes to petabytes of data, enabling near real-time analysis on combined public and private datasets.
4. **Simplified Access:** Authorized users can query data using standard SQL, receiving manageable subsets without requiring advanced database language knowledge.
## Business Impact
### For the Companies Involved
- **Redivis:** Secures a powerful, scalable, and trusted infrastructure partner (Google Cloud), drastically improving its platform's performance (queries improved from hours to seconds) and scalability, enabling rapid growth across academic institutions.
- **Google Cloud:** Gains a high-profile use case in the highly regulated and intellectually valuable research sector, validating BigQuery's suitability for demanding, security-conscious multi-source data integration and petabyte-scale analytics.
### For Competitors
- Increased pressure on competing cloud providers and specialized research data platforms to offer equally robust, secure, and scalable data access and querying environments. The focus on tiered, low-friction access sets a new benchmark for onboarding sensitive research data.
### For Customers
- **Researchers:** Benefit from dramatically faster query times, easier data discovery, standardized query languages (SQL), and rapid access to high-value, previously siloed datasets, accelerating hypothesis testing.
- **Data Administrators:** Gain granular, auditable control over data usage and sharing via the tiered access model, mitigating security risks while promoting data utility.
### For the Market
- This signals a significant trend toward cloud-native solutions for academic and sensitive data management, emphasizing security-as-a-feature alongside scalability. It highlights the growing "data commons" concept, facilitated by major cloud providers, as the future of collaborative, secure data sharing outside traditional, slow-moving institutional repositories.
## Technical Implications
The solution successfully integrates Cloud Storage for ingestion and BigQuery for warehousing and compute. A core technical feat is the ability of BigQuery’s architecture to support real-time analysis across *public and restricted datasets* simultaneously, and the seamless integration capabilities allowing easy export to visualization tools (Data Studio) and analytical notebooks (Python/R clients).
## Strategic Analysis
- **Market Positioning:** Redivis positions itself as the secure, frictionless intermediary layer optimizing institutional data assets for cloud-native research. Google Cloud strengthens its position in the public sector and high-performance research computing market segment.
- **Competitive Advantage:** Redivis’s primary advantage lies in its tiered access model combined with BigQuery’s demonstrated performance gains at massive scale, effectively solving the dual problem of security gatekeeping and analytical complexity.
- **Challenges:** The ongoing challenge is adoption—convincing numerous independent academic institutions to standardize on a single platform for data governance and access.
## Industry Reactions
(Based on the article, direct industry reactions are not provided, but the context suggests a positive response.)
- **Expert Commentary:** The CEO’s testimonial on query performance improvement (hours to seconds) indicates a major technological leap, highly valued in data-intensive fields.
## Future Outlook
- Expect continued expansion of Redivis onto more institutional datasets, likely leading to deeper integration of advanced AI/ML tools accessible via the BigQuery pipeline. Watch for further announcements detailing how more complex governance rules are being automated for disparate, international datasets.
## For Security Professionals
This case study underscores the critical importance of modern cloud security primitives—encryption in transit and rest, robust key management, and detailed audit logging—not just for traditional enterprise workloads but for sensitive academic data. The tiered access control demonstrates a principle of least privilege applied at the *data access* layer, allowing security teams to enforce granular policies based on researcher needs while ensuring accountability through audit logs.