close

Understanding Azure Data Lake Storage Gen2 Pricing For Big Data Analytics Workloads

Posted by Roger Padgett on Aug, 23, 2021 12:08

The Azure Data Lake Storage General was provided to the public by Microsoft on February 7, 2019. Since then the service has continued to evolve. Prior to the introduction to Data Lake Storage Gen 2, to meet the demand for cloud storage in Azure, organizations had to choose between Azure Data Lake Store and Azure Blob Storage. Currently, the new service provided under the Azure Data Lake Storage pricing structure provides the combined benefits of the file system model without losing the advantages of an object-store.

The security provided by the features within the ADLS is based on several levels of control established on access to data. This aids in maintaining the data-level security with ease.

What is Azure Data Lake Storage Gen2 ? A brief explanation

microsoft azure data lake

The term ‘Azure Data Lake Storage Gen2’ refers to a collection of capabilities that were developed by Microsoft for the purpose of improving big data analytics. The service is designed to primarily provide large cloud storage capacity to the user organization along with certain low-cost Azure Blob Storage capabilities. The Data Lake Storage Gen2 is a confluence of the advantages previously provided by the Azure Data Lake Store and the Azure Blob Storage. Some of the beneficial aspects of this service include the file system semantics, the file system level security, and high scalability.

As the services within the Azure Data Lake Storage pricing structure are based on Blob Storage they are equipped with cloud storage services like warehousing and hierarchical directory structure. Due to its capacity of improving the capacity of improving the overall work performance, the Data Lake Storage Gen2 is one of the preferred analytics platforms in the market. Along with performing analytics applications for innumerable petabytes of information, the ADLS is capable of meta-data Storage.

What is Azure Data Lake Storage Pricing model of Gen2

The Azure Storage pricing structure can be categorized under the following broadheads for a hierarchical namespace with LRS Redundancy:

   Data Storage-

 PremiumHotCoolArchive
First 50TB/month$0.15 per GB$0.0184 per GB$0.01 per GB$0.00099 per GB
Next 450 TB/month$0.15 per GB$0.0177 per GB$0.01 per GB$0.00099 per GB
Over 500 TB/month$0.15 per GB$0.0169 per GB$0.01 per GB$0.00099 per GB

Storage Capacity Reservations –

        1     YearReserved         3  YearReserved
 HotCoolArchiveHotCoolArchive
100 PB/ month$1,545$840$91$1,244$676$84
1 PB/month$15,050$8,179$883$11,963$6,502$810

Transaction –

 PremiumHotCoolArchive
Write operations$0.0228$0.065$0.13$0.13
Read operations$0.00182$0.0052$0.013$6.50
Query Acceleration Data ScannedN/A$0.002$0.002N/A
Query Acceleration Data ReturnedN/A$0.0007$0.01N/A

Other operations and Meta data Storage meters –

 PremiumHotCoolArchive
Iterative Read Operations  (per 10,000)$0.0228$0.065$0.065$0.065
Iterative Write Operations (100’s)$0.0228$0.065$0.13$0.13
All other operations  (per 10,000), except Delete, which is free$0.00182$0.0052$0.0052$0.0052
Data Retrieval (per GB)N/AFREE$0.01$0.02
Data Write (per GB)N/AFREEFREEFREE
Meta Data Storage (GB/Month)$0.15$0.0263N/AN/A

Azure Data Lake Storage Gen2 Pricing For Data Transfer

The data transfer prices within the Azure Data Lake Storage pricing structure can be categorized in the following manner:

Data storage prices pay-as-you-go

All prices are per GB per month.

Not availablePremiumHotCoolArchive
First 50 terabyte (TB) / month$0.15 per GB$0.018 per GB$0.01 per GB$0.00099 per GB
Next 450 TB / month$0.15 per GB$0.0173 per GB$0.01 per GB$0.00099 per GB
Over 500 TB / month$0.15 per GB$0.0166 per GB$0.01 per GB$0.00099 per GB

Azure Storage Reserved Capacity

       1   YEARRESERVED   3YEARRESERVED
 HotCoolArchiveHotCoolArchive
100TB/month$1,545$840$91$1,244$676$84
1PB/month$15,050$8,179$883$11,963$6,502$810

Operations and Data transfer pricing –

 PremiumHotCoolArchive
Write operations (per 10,000)$0.0228$0.065$0.13$0.13
Read operations (per 10,000)$0.0019$0.005$0.013$6.50
Iterative Read Operations  (per 10,000)N/A$0.005$0.013$6.50
Iterative Write Operations (100’s)N/A$0.065$0.13$0.13
Data Retrieval (per GB)N/AN/A$0.01$0.02
Data write (per GB)FREEFREEFREEFREE
Index (GB/month)N/A$0.026N/AN/A
All other options
(per 10,000), except for Delete, which is free
$0.0019$0.005$0.013$6.50

What are the features and advantages of Data Lake Storage Gen2

The features included in the ADLS are as follows:-

  • Azure Storage is scalable in design. This implies that it can be accessed in Data Lake Storage Gen2 or Blob Storage interfaces.
  • The data security in the ADLS is incomparable as it has the features of the Blob Storage
  • The features within the Azure Data Lake Store pricing structure are very cost-effective.

Tackling Enterprise-grade analytics workloads: Big Data Analytics Approach

Azure Data lake Storage

The ADLS pricing structure provides for services that are designed to combine the innovations of the Azure Data Lake Store along with the scalability and features of the Azure Blob Storage. The collection of features provide a file system interface that is compatible with Hadoop. This capability works to great advantage for the user company when paired with the Hot, Cool, and Archive Tiers of the Azure Data Lake Storage pricing structure.

Hierarchical namespace and Hierarchical file storage: Meaning

One of the fundamental parts of the Data Lake Storage Gen2 is the feature of adding a hierarchical namespace to the Blob Storage. This feature allows the organization of objects or files in an account that is later organized into a hierarchy of directories. Within the services provided under the ADLS range, the hierarchical namespace feature is designed to organize the files in the same way in which they are organized in a computer.

When a user organization enables the hierarchical namespace feature, the storage account becomes capable of providing the scalability and the cost-effectiveness of an object storage account. On the other hand, hierarchical files are the files and objects organized in the hierarchical namespace.

Azure Data lake Storage vs Blog Storage

Query acceleration in Azure Data Lake Storage: A brief Explanation

The concept of ‘Query Acceleration‘ refers to a structure of analytical applications that are designed to optimize the data processing method by optimizing the hierarchical directory structure. This helps in finding the data that is required to complete the given operation. In simple words, the query acceleration process aids in reducing the time required to compute resources critical insights on the stored data.

The process accepts the filtering predicates and column projections which aid applications to filter the rows and columns while the data is read from the disk. This implies that only the data that is compatible with the predicates is transferred to the application. This tends to reduce the network latency and the cost of ownership.

Security benefits of Data Lake Storage Gen2:

Azure Data Lake Integration

The Azure Data Lake Storage Gen2 is an efficient service that performs enterprise-grade analytics applications with high levels of security. The security system employed by this Microsoft service is implemented on two levels. These were also applicable in the Azure Data Lake Store. Although these two security levels are not new to the capabilities provided under the Azure Data Lake Storage pricing range, they are still crucial for maintaining an efficient data access system. The two levels of security within the service can be illustrated in the following manner:-

  • Role-based Access Control – This includes the features like the built in roles of Azure like the reader, contributor, owner, custom or other roles. Usually, this security level is provided for two main reasons. The first reason includes the need to specify who can manage the service. The other reason includes the need to assign a permit for using the in-built Data Explorer tools that need reader permissions. Thus, role based access control can be considered as file level security.
  • Access control lists – These lists are designed to specify the data objects that can be read, written or executed by the user. The POSIX permissions are granted because the access control lists are specified for every object. While there is a constant need of assigning access level lists to every object, there is also a maximum limit of 32 access level lists that can be assigned to each individual object. This makes it crucial to manage the data level security in the services provided under the ADLS pricing range through groups in Azure Active Directory.

Azure Consultation: An EPC Group perspective

The EPC Group has over 22 years of experience in consulting organizations and helping them utilize the cloud storage platform and implement innovative services to compute resources related to data. Along with this the company is a Gold Certified Partner of Microsoft and helps enterprises implement Microsoft products and services training global exposure. The company has a team consisting of extensively qualified experts that have the Microsoft Certification in consultation.

Within the customized Azure consulting sessions for Azure Data Lake Storage, the experts train the professionals of various organizations on a wide range of services that are provided within the Azure package. These include the learning of basic to advanced Azure Data Factory, services included in the Azure Data Lake Storage pricing structure, and other concepts of the Azure Analysis Services. The training courses also give an introduction to the warehousing capabilities of the Azure platform and Azure Data Lake Analytics.

Conclusion:

As the Azure Data Lake Storage Gen2 is built on the certain capabilities of the Azure Data Lake Store or the Gen1, there are several actions of it that are still supported in the Data Lake Storage Gen2. But, as the new service is more modified than the previous one and is still continually evolving, organizations can choose to migrate from the Azure Data Lake Store to the Data Lake Storage Gen2.

The easiest way of migrating data from the Data Lake Storage Gen1 to Gen2 is using Azure Data Factory. Thus, in the contemporary market arena organizations can utilize the facilities offered by the services under the ADLS to achieve global recognition and exposure.

[gravityforms id=41 title=”true” description=”false”]
<div class='gf_browser_chrome gform_wrapper exit_intent_popup_wrapper gform_legacy_markup_wrapper' id='gform_wrapper_41' > <div class='gform_heading'> <h3 class="gform_title">Exit Intent</h3> <span class='gform_description'></span> </div><form method='post' enctype='multipart/form-data' id='gform_41' class='exit_intent_popup gform_legacy_markup' action='/azure-data-lake-storage-gen2-pricing-for-big-data-analytics-workloads/' > <div class='gform_body gform-body'><ul id='gform_fields_41' class='gform_fields top_label form_sublabel_below description_below'><li id="field_41_1" class="gfield gform_hidden field_sublabel_below field_description_below gfield_visibility_visible" ><div class='ginput_container ginput_container_text'><input name='input_1' id='input_41_1' type='hidden' class='gform_hidden' aria-invalid="false" value='https://www.epcgroup.net/azure-data-lake-storage-gen2-pricing-for-big-data-analytics-workloads/' /></div></li><li id="field_41_9" class="gfield gfield_contains_required field_sublabel_below field_description_below gfield_visibility_visible" ><label class='gfield_label' for='input_41_9' >Full Name<span class="gfield_required"><span class="gfield_required gfield_required_asterisk">*</span></span></label><div class='ginput_container ginput_container_text'><input name='input_9' id='input_41_9' type='text' value='' class='medium' placeholder='Full Name' aria-required="true" aria-invalid="false" /> </div></li><li id="field_41_6" class="gfield gfield_contains_required field_sublabel_below field_description_below gfield_visibility_visible" ><label class='gfield_label' for='input_41_6' >Email<span class="gfield_required"><span class="gfield_required gfield_required_asterisk">*</span></span></label><div class='ginput_container ginput_container_email'> <input name='input_6' id='input_41_6' type='text' value='' class='medium' placeholder='Email Address' aria-required="true" aria-invalid="false" /> </div></li><li id="field_41_7" class="gfield gfield_contains_required field_sublabel_below field_description_below gfield_visibility_visible" ><label class='gfield_label' for='input_41_7' >Phone<span class="gfield_required"><span class="gfield_required gfield_required_asterisk">*</span></span></label><div class='ginput_container ginput_container_phone'><input name='input_7' id='input_41_7' type='text' value='' class='medium' placeholder='Phone Number' aria-required="true" aria-invalid="false" /></div></li><li id="field_41_10" class="gfield gfield_contains_required field_sublabel_below field_description_below gfield_visibility_visible" ><label class='gfield_label' for='input_41_10' >Company Name<span class="gfield_required"><span class="gfield_required gfield_required_asterisk">*</span></span></label><div class='ginput_container ginput_container_text'><input name='input_10' id='input_41_10' type='text' value='' class='medium' placeholder='Company Name' aria-required="true" aria-invalid="false" /> </div></li><li id="field_41_8" class="gfield gfield_contains_required field_sublabel_below field_description_below gfield_visibility_visible" ><label class='gfield_label' for='input_41_8' >Message<span class="gfield_required"><span class="gfield_required gfield_required_asterisk">*</span></span></label><div class='ginput_container ginput_container_textarea'><textarea name='input_8' id='input_41_8' class='textarea medium' placeholder='Type your message here...' aria-required="true" aria-invalid="false" rows='10' cols='50'></textarea></div></li></ul></div> <div class='gform_footer top_label'> <input type='submit' id='gform_submit_button_41' class='gform_button button' value='Submit' onclick='if(window["gf_submitting_41"]){return false;} window["gf_submitting_41"]=true; ' onkeypress='if( event.keyCode == 13 ){ if(window["gf_submitting_41"]){return false;} window["gf_submitting_41"]=true; jQuery("#gform_41").trigger("submit",[true]); }' /> <input type='hidden' class='gform_hidden' name='is_submit_41' value='1' /> <input type='hidden' class='gform_hidden' name='gform_submit' value='41' /> <input type='hidden' class='gform_hidden' name='gform_unique_id' value='' /> <input type='hidden' class='gform_hidden' name='state_41' value='WyJbXSIsIjEwNTJhNGVmMWMyNzI3YTJmMjdiZTA1NjU4ZDMzYzY3Il0=' /> <input type='hidden' class='gform_hidden' name='gform_target_page_number_41' id='gform_target_page_number_41' value='0' /> <input type='hidden' class='gform_hidden' name='gform_source_page_number_41' id='gform_source_page_number_41' value='1' /> <input type='hidden' name='gform_field_values' value='' /> </div> </form> </div>