close

Key Differences Between a Data Lake vs Data Warehouses – Which is Best?

Posted by Kevin Booth on Oct, 15, 2019 04:10


Harnessing the means of capturing and analyzing any type of data has surfaced as a vitally important business capability in today’s competitive business environment.

Legacy data warehousing as well as other analytical systems can be slow and stubborn to adapt. Using a data lake to bring your data architecture into the future can be an effective way to continue leveraging existing investments, begin harnessing different types of data, and ultimately gaining insights faster. IT executives are looking to acquire proven techniques to deliver accurate information timely and cost-effectively. While adopting a data lake is not a one-solution fits all answer for everything, but it can bring consistent value to the organization if implemented effectively.

Those of us that are data and analytics practitioners are familiar with the terminology of data lakes and data warehouses and as we begin to discuss big data solutions with many of our client’s we typically discover that they often haven’t heard these terms or don’t really have a good command and understanding for what they actually mean.

Both data warehouses and enterprise data lakes are both used for storing big data however they mean different things, have different capabilities and benefits. A data warehouse is more of a repository for structured data that has already been processed and allocated for a specific purpose. A data lake is a vast pool of raw data and the purpose for this is unique to the organization in which it lives. The only real significant similarity between the two is that they are both used to store your data. While a data warehouse may work for one company a data lake may be a much better fit for another.

Data Lakes

So, what is an enterprise data lake? One of the most significant differences between data lakes and data warehouses is that data lakes act as a centralized repository or pool of raw data where you can store all your data “as-is”, in a leaf level or un-transformed state and is generally created without a specific purpose in mind. It can handle all source data as well as unstructured or structured data from a wide variety of data sources which makes this strategy much more flexible for a variable use-cases. The cost for data lakes is typically less because it is built on commodity hardware and has a greater ability to store much larger amounts of data.

Data Warehouse

A data warehouse is a data storage system that aggregates structured data from various internal sources for the purpose of comparison and analysis typically in the field of business intelligence. Data Warehouses store current and historical data and many times are used for creating trending reports for senior management for annual and quarterly comparisons.

A data warehouse is a repository of data that is highly modeled. In other words, any data you find in a data warehouse is going to be carefully related to the other data in that data warehouse. In addition, data in a warehouse tends to be highly standardized and cleansed. Typically, data is never loaded into a data warehouse until the use for that data has been clearly identified.

The Pro’s & Con’s of Data Lakes vs Data Warehouses

Data Lakes:

Since enterprise data lakes primarily store raw, unprocessed data, this data can be used for any purpose, which makes it ideal for artificial intelligence (AI), machine learning and data science. However, unprocessed data does require a large storage capacity and there can also be data governance issues with this strategy.

One of the largest benefits to a data lake is that it is designed as an inexpensive storage option. However, as cheap raw storage, the con’s fall into the handling of the data. What’s the strategy when it comes to metadata, security, governance in a data lake? This is where unpredictable costs can apply.

Data lakes can yield results quicker because more data is already there and ready to be disseminated. However, data lakes place more responsibility on the user to explore the data and find the use cases.

Data Warehouses:

As for data warehouses, since the stored data is structured and already processed, it’s much easier for organizations to find and understand this data. Data warehouses are great environments for exploring data relationships across your organization. For example, if client, products and facility information are all in the data warehouse, the data warehouse makes it much easier to see the customer satisfaction and the returns that are related to the different facilities at which those products are created.

But this significant advantage of data warehouses provides little flexibility and does require a great deal of labor. Data warehouses take serious effort to build and maintain. Also, changes take a long time to implement because when new data is added it has to be reconciled in relation to all of the other data living in that data warehouse.

In data lakes adding data is relatively straightforward since the data does not need to be reconciled with existing data.

Data Lake or a Data Warehouse. Do I need both?

There are definite differences and pro’s and con’s to both strategies when it comes to comparing data lakes with data warehouses. However, most organizations can benefit from adopting both. Businesses can first consolidate data from many sources into their data lake where they can perform a variety of workloads including preparing data for the data warehouse, running batch analytical workloads, running machine learning workloads and more.

Adopting a hybrid approach and integrating both a data warehouse and a data lake ensures data can be used effectively and has integrity and context. Data lakes are merely dumping grounds for source data. This is a great source of data for your data warehouse. Infact, long before big-data arrived many senior architects were building data lakes or “staging areas” as a best practice for storing data needed by a data warehouse. We want to clarify this as we find that all too often data lakes are assumed as a kind of magic bullet replacement for data warehouses and this can be far from the truth.

Which Approach Should I Choose?

That can be a challenging question indeed. If you already have well established data warehouse, I certainly don’t advocate abandoning that work and starting over from scratch. However, like many other data warehouses, yours may suffer from some of the issues I have described. If this is the case, you may choose to implement a data lake along-side your data warehouse.

Your data warehouse can continue to operate as it has and you can start filling your data lake with new data sources. You can also use the data lake as a type of archive repository for your data warehouse that you roll off and keep available to provide users with access to more data than ever before. As your warehouse ages, you may consider moving it to the data lake or you may continue with a hybrid approach.

If you are just starting down the path of building a centralized data platform strategy, I would recommend that you take the time to consider a hybrid approach.

If you are looking for an experienced consulting firm who understands Data Lakes, Data Warehouses and the importance of being able to quickly analyze and decipher all of your organizations data, please contact EPC Group to discuss your options.

[gravityform id="43" title="true" description="false" ajax="true"]
<div class='gf_browser_unknown gform_wrapper gform_legacy_markup_wrapper' id='gform_wrapper_43' ><div id='gf_43' class='gform_anchor' tabindex='-1'></div> <div class='gform_heading'> <h3 class="gform_title">Subscriber - Powerbi e-book</h3> </div><form method='post' enctype='multipart/form-data' target='gform_ajax_frame_43' id='gform_43' action='/key-differenciators-between-data-lakes-data-warehouses-which-one-is-best-for-you/#gf_43' > <div class='gform_body gform-body'><ul id='gform_fields_43' class='gform_fields top_label form_sublabel_below description_below'><li id="field_43_7" class="gfield gfield--width-full gform_hidden field_sublabel_below field_description_below gfield_visibility_visible" ><div class='ginput_container ginput_container_text'><input name='input_7' id='input_43_7' type='hidden' class='gform_hidden' aria-invalid="false" value='https://www.epcgroup.net/key-differenciators-between-data-lakes-data-warehouses-which-one-is-best-for-you/' /></div></li><li id="field_43_4" class="gfield gfield_html gfield_html_formatted gfield_no_follows_desc field_sublabel_below field_description_below gfield_visibility_visible" > <div class="description_data"> <p class="dp_one">Subscribe to our newsletter and get the first three chapters of the eBook for <strong>free<strong>.</p> </div></li><li id="field_43_6" class="gfield gfield_contains_required field_sublabel_below field_description_below gfield_visibility_visible" ><label class='gfield_label gfield_label_before_complex' >Name<span class="gfield_required"><span class="gfield_required gfield_required_asterisk">*</span></span></label><div class='ginput_complex ginput_container no_prefix has_first_name no_middle_name has_last_name no_suffix gf_name_has_2 ginput_container_name' id='input_43_6'> <span id='input_43_6_3_container' class='name_first' > <input type='text' name='input_6.3' id='input_43_6_3' value='' aria-label='First name' aria-required='true' placeholder='First Name' /> <label for='input_43_6_3' >First</label> </span> <span id='input_43_6_6_container' class='name_last' > <input type='text' name='input_6.6' id='input_43_6_6' value='' aria-label='Last name' aria-required='true' placeholder='Last Name' /> <label for='input_43_6_6' >Last</label> </span> </div></li><li id="field_43_2" class="gfield gfield_contains_required field_sublabel_below field_description_below gfield_visibility_visible" ><label class='gfield_label' for='input_43_2' >Email Address<span class="gfield_required"><span class="gfield_required gfield_required_asterisk">*</span></span></label><div class='ginput_container ginput_container_email'> <input name='input_2' id='input_43_2' type='text' value='' class='medium' aria-required="true" aria-invalid="false" aria-describedby="gfield_description_43_2" /> </div><div class='gfield_description' id='gfield_description_43_2'>Please enter your correct email address. You will receive an email to download the eBook.</div></li><li id="field_43_3" class="gfield g-captcha field_sublabel_below field_description_below gfield_visibility_visible" ><label class='gfield_label screen-reader-text' for='input_43_3' ></label><div id='input_43_3' class='ginput_container ginput_recaptcha' data-sitekey='6LdQ388UAAAAAJaahWs7D_jWzeQhUZW6-VNwWfaU' data-theme='light' data-tabindex='0' data-badge=''></div></li><li id="field_43_5" class="gfield gfield_html gfield_html_formatted gfield_no_follows_desc field_sublabel_below field_description_below gfield_visibility_visible" ><div class="note_description"><p><i><strong>NOTE: </strong>We will never send you spam or pass on your email address to any third party. You may choose to opt-out at any time.</i></p></div></li></ul></div> <div class='gform_footer top_label'> <input type='submit' id='gform_submit_button_43' class='gform_button button' value='Download Now' onclick='if(window["gf_submitting_43"]){return false;} window["gf_submitting_43"]=true; ' onkeypress='if( event.keyCode == 13 ){ if(window["gf_submitting_43"]){return false;} window["gf_submitting_43"]=true; jQuery("#gform_43").trigger("submit",[true]); }' /> <input type='hidden' name='gform_ajax' value='form_id=43&amp;title=1&amp;description=&amp;tabindex=0' /> <input type='hidden' class='gform_hidden' name='is_submit_43' value='1' /> <input type='hidden' class='gform_hidden' name='gform_submit' value='43' /> <input type='hidden' class='gform_hidden' name='gform_unique_id' value='' /> <input type='hidden' class='gform_hidden' name='state_43' value='WyJbXSIsIjEwNTJhNGVmMWMyNzI3YTJmMjdiZTA1NjU4ZDMzYzY3Il0=' /> <input type='hidden' class='gform_hidden' name='gform_target_page_number_43' id='gform_target_page_number_43' value='0' /> <input type='hidden' class='gform_hidden' name='gform_source_page_number_43' id='gform_source_page_number_43' value='1' /> <input type='hidden' name='gform_field_values' value='' /> </div> <p style="display: none !important;"><label>&#916;<textarea name="ak_hp_textarea" cols="45" rows="8" maxlength="100"></textarea></label><input type="hidden" id="ak_js" name="ak_js" value="216"/><script>document.getElementById( "ak_js" ).setAttribute( "value", ( new Date() ).getTime() );</script></p></form> </div> <iframe style='display:none;width:0px;height:0px;' src='about:blank' name='gform_ajax_frame_43' id='gform_ajax_frame_43' title='This iframe contains the logic required to handle Ajax powered Gravity Forms.'></iframe> <script type="text/javascript"> gform.initializeOnLoaded( function() {gformInitSpinner( 43, 'https://www.epcgroup.net/wp-content/plugins/gravityforms/images/spinner.svg' );jQuery('#gform_ajax_frame_43').on('load',function(){var contents = jQuery(this).contents().find('*').html();var is_postback = contents.indexOf('GF_AJAX_POSTBACK') >= 0;if(!is_postback){return;}var form_content = jQuery(this).contents().find('#gform_wrapper_43');var is_confirmation = jQuery(this).contents().find('#gform_confirmation_wrapper_43').length > 0;var is_redirect = contents.indexOf('gformRedirect(){') >= 0;var is_form = form_content.length > 0 && ! is_redirect && ! is_confirmation;var mt = parseInt(jQuery('html').css('margin-top'), 10) + parseInt(jQuery('body').css('margin-top'), 10) + 100;if(is_form){jQuery('#gform_wrapper_43').html(form_content.html());if(form_content.hasClass('gform_validation_error')){jQuery('#gform_wrapper_43').addClass('gform_validation_error');} else {jQuery('#gform_wrapper_43').removeClass('gform_validation_error');}setTimeout( function() { /* delay the scroll by 50 milliseconds to fix a bug in chrome */ jQuery(document).scrollTop(jQuery('#gform_wrapper_43').offset().top - mt); }, 50 );if(window['gformInitDatepicker']) {gformInitDatepicker();}if(window['gformInitPriceFields']) {gformInitPriceFields();}var current_page = jQuery('#gform_source_page_number_43').val();gformInitSpinner( 43, 'https://www.epcgroup.net/wp-content/plugins/gravityforms/images/spinner.svg' );jQuery(document).trigger('gform_page_loaded', [43, current_page]);window['gf_submitting_43'] = false;}else if(!is_redirect){var confirmation_content = jQuery(this).contents().find('.GF_AJAX_POSTBACK').html();if(!confirmation_content){confirmation_content = contents;}setTimeout(function(){jQuery('#gform_wrapper_43').replaceWith(confirmation_content);jQuery(document).scrollTop(jQuery('#gf_43').offset().top - mt);jQuery(document).trigger('gform_confirmation_loaded', [43]);window['gf_submitting_43'] = false;wp.a11y.speak(jQuery('#gform_confirmation_message_43').text());}, 50);}else{jQuery('#gform_43').append(contents);if(window['gformRedirect']) {gformRedirect();}}jQuery(document).trigger('gform_post_render', [43, current_page]);} );} ); </script>
[gravityforms id=41 title=”true” description=”false”]
<div class='gf_browser_unknown gform_wrapper exit_intent_popup_wrapper gform_legacy_markup_wrapper' id='gform_wrapper_41' > <div class='gform_heading'> <h3 class="gform_title">Exit Intent</h3> <span class='gform_description'></span> </div><form method='post' enctype='multipart/form-data' id='gform_41' class='exit_intent_popup gform_legacy_markup' action='/key-differenciators-between-data-lakes-data-warehouses-which-one-is-best-for-you/' > <div class='gform_body gform-body'><ul id='gform_fields_41' class='gform_fields top_label form_sublabel_below description_below'><li id="field_41_1" class="gfield gform_hidden field_sublabel_below field_description_below gfield_visibility_visible" ><div class='ginput_container ginput_container_text'><input name='input_1' id='input_41_1' type='hidden' class='gform_hidden' aria-invalid="false" value='https://www.epcgroup.net/key-differenciators-between-data-lakes-data-warehouses-which-one-is-best-for-you/' /></div></li><li id="field_41_11" class="gfield gfield--width-full gform_hidden field_sublabel_below field_description_below gfield_visibility_visible" ><div class='ginput_container ginput_container_text'><input name='input_11' id='input_41_11' type='hidden' class='gform_hidden' aria-invalid="false" value='ddd01b75-d4fc-ea11-a816-000d3a591fb8' /></div></li><li id="field_41_12" class="gfield gfield--width-full gform_hidden field_sublabel_below field_description_below gfield_visibility_visible" ><div class='ginput_container ginput_container_text'><input name='input_12' id='input_41_12' type='hidden' class='gform_hidden' aria-invalid="false" value='' /></div></li><li id="field_41_13" class="gfield gfield--width-full gform_hidden field_sublabel_below field_description_below gfield_visibility_visible" ><div class='ginput_container ginput_container_text'><input name='input_13' id='input_41_13' type='hidden' class='gform_hidden' aria-invalid="false" value='' /></div></li><li id="field_41_9" class="gfield gfield_contains_required field_sublabel_below field_description_below gfield_visibility_visible" ><label class='gfield_label' for='input_41_9' >Full Name<span class="gfield_required"><span class="gfield_required gfield_required_asterisk">*</span></span></label><div class='ginput_container ginput_container_text'><input name='input_9' id='input_41_9' type='text' value='' class='medium' placeholder='Full Name' aria-required="true" aria-invalid="false" /> </div></li><li id="field_41_6" class="gfield gfield_contains_required field_sublabel_below field_description_below gfield_visibility_visible" ><label class='gfield_label' for='input_41_6' >Email<span class="gfield_required"><span class="gfield_required gfield_required_asterisk">*</span></span></label><div class='ginput_container ginput_container_email'> <input name='input_6' id='input_41_6' type='text' value='' class='medium' placeholder='Email Address' aria-required="true" aria-invalid="false" /> </div></li><li id="field_41_7" class="gfield gfield_contains_required field_sublabel_below field_description_below gfield_visibility_visible" ><label class='gfield_label' for='input_41_7' >Phone<span class="gfield_required"><span class="gfield_required gfield_required_asterisk">*</span></span></label><div class='ginput_container ginput_container_phone'><input name='input_7' id='input_41_7' type='text' value='' class='medium' placeholder='Phone Number' aria-required="true" aria-invalid="false" /></div></li><li id="field_41_10" class="gfield gfield_contains_required field_sublabel_below field_description_below gfield_visibility_visible" ><label class='gfield_label' for='input_41_10' >Company Name<span class="gfield_required"><span class="gfield_required gfield_required_asterisk">*</span></span></label><div class='ginput_container ginput_container_text'><input name='input_10' id='input_41_10' type='text' value='' class='medium' placeholder='Company Name' aria-required="true" aria-invalid="false" /> </div></li><li id="field_41_8" class="gfield gfield_contains_required field_sublabel_below field_description_below gfield_visibility_visible" ><label class='gfield_label' for='input_41_8' >Message<span class="gfield_required"><span class="gfield_required gfield_required_asterisk">*</span></span></label><div class='ginput_container ginput_container_textarea'><textarea name='input_8' id='input_41_8' class='textarea medium' placeholder='Type your message here...' aria-required="true" aria-invalid="false" rows='10' cols='50'></textarea></div></li></ul></div> <div class='gform_footer top_label'> <input type='submit' id='gform_submit_button_41' class='gform_button button' value='Submit' onclick='if(window["gf_submitting_41"]){return false;} window["gf_submitting_41"]=true; ' onkeypress='if( event.keyCode == 13 ){ if(window["gf_submitting_41"]){return false;} window["gf_submitting_41"]=true; jQuery("#gform_41").trigger("submit",[true]); }' /> <input type='hidden' class='gform_hidden' name='is_submit_41' value='1' /> <input type='hidden' class='gform_hidden' name='gform_submit' value='41' /> <input type='hidden' class='gform_hidden' name='gform_unique_id' value='' /> <input type='hidden' class='gform_hidden' name='state_41' value='WyJbXSIsIjEwNTJhNGVmMWMyNzI3YTJmMjdiZTA1NjU4ZDMzYzY3Il0=' /> <input type='hidden' class='gform_hidden' name='gform_target_page_number_41' id='gform_target_page_number_41' value='0' /> <input type='hidden' class='gform_hidden' name='gform_source_page_number_41' id='gform_source_page_number_41' value='1' /> <input type='hidden' name='gform_field_values' value='' /> </div> <p style="display: none !important;"><label>&#916;<textarea name="ak_hp_textarea" cols="45" rows="8" maxlength="100"></textarea></label><input type="hidden" id="ak_js" name="ak_js" value="28"/><script>document.getElementById( "ak_js" ).setAttribute( "value", ( new Date() ).getTime() );</script></p></form> </div>