Modern analytics and the resulting business insights unlock new opportunities to optimize company performance and open new revenue streams. Since these initiatives also heighten the need for greater security and governance of company data, Identity and Access Management (IAM) needs to be a foundational component of any corporate security plan that covers company data.

Critical components of a strong Identity and Access Management policy are:

  • Ability to identify users and the roles they are assigned
  • Capability to establish and enforce different levels of data protection
  • Ability to identify sensitive data
  • Auditability of data access by individuals
  • Tracking lineage of data as it moves through the enterprise and lifecycle
  • Automate access control and metadata tracking to support corporate GDPR compliance

Microsoft Azure enables customers to quickly provision compute, storage, and networking in the cloud. Cloudera Data Platform (CDP) is deeply integrated with Azure to provide advanced analytics and machine learning capabilities, while also supporting IAM policies. Central to CDP on Azure is Cloudera Shared Data Experience (SDX), which makes it easy to create a secure data lake (10s of minutes instead of weeks) and create policies once that are applied everywhere.

Identity is a core aspect of Azure and CDP. Users can extend their corporate Active Directory to the cloud with Azure Active Directory (AAD). Cloudera SDX uses AAD for Single Sign On to CDP. Microsoft User and Group identities in AAD are integrated with CDP.

For Authorization, Cloudera SDX includes Apache Ranger to manage access control. Using the Apache Ranger console, security administrators can easily create and manage policies for access to files, folders, databases, tables, and/or columns. These policies can be set for individual users or groups and then enforced consistently across all the analytics in the CDP stack. Ranger provides fine-grained control for queries to the data warehouses and operational data. Recently, Cloudera introduced a Tech Preview for file and folder level access controls with deep integration into Azure Data Lake Storage (ADLS).

Diagram of dynamic classification based security policies

Data Governance is driven by metadata. Cloudera Data Platform with SDX leverages Apache Atlas to address the capturing phase of data, which creates agile data modeling with a custom metadata structure for all data sources and easily builds a hierarchical data taxonomy. It provides cleaner metadata for data modeling and REST APIs for other apps to easily call the service. With Atlas, data administrators and stewards also can define, annotate, and automate the capture of relationships between data sets and underlying elements including source, target, and derivation processes.

CDP delivers all of this functionality in Azure as a cloud-native service, which can be deployed into the customer’s Azure Subscription. Architected for the cloud means customers can take advantage of ADLS for data storage and quickly spin up self-service experiences like Cloudera Machine Learning, Data Warehouse, and Data Engineering utilizing Azure Kubernetes Service in a cost-effective “consume only what you need” strategy.  All the while, delivering this in a safe and secure platform powered by Cloudera SDX.

Hive table diagram

Next Steps

Resources

  • CDP on Azure Quickstart
  • Configuring Azure Active Directory identity federation in CDP documentation
  • Access control for Azure ADLS cloud object storage on the Cloudera blog
  • Cloudera Data Warehouse on Azure Provides Fast, Cost-Effective and Highly Scalable Analytics on the Cloudera blog

Visit us at Microsoft’s Open Azure Day Virtual Event

  • Attend Arun Murthy, Cloudera CPO and Sarah Novotny, Microsoft Azure Office of the CTO, Fireside chat during the keynote session.
  • Hear Ram Venkatesh, Cloudera VP of Engineering discuss the benefits of Cloudera Data Platform during a digital deep dive session.
  • Visit us at our sponsor page and enter to win an Oculus Quest 2 VR Headset.