Building (data) lake houses

22

September

2021

4.5/5 (2)

Most people can only dream of one day owning a lake house. A cozy cottage by a vast lake, to fish, or to simply sail your boat. However, this article is not talking about those type of lake houses. In fact, digital lake houses seemed like a dream but have now come into existence.

The term lake-house is a portmanteau of both a ‘data warehouse’ and a ‘data lake’. Both are a source of data storage (Databricks, 2021). The former mentioned, data warehouse, is filled with well-structured historic corporate data, data such as business sales in the past few years. This can be used to analyze, estimate, and evaluate future changes, through so-called business intelligence or BI. The latter mentioned, data lakes, are in essence the rest of the data, less structured or semi-structured, cheaper raw data.

This management architecture is still in its infancy but is fast gaining ground in the software industry. Databricks, the company behind the design of data lake houses is one of the fast-growing software companies worldwide and is expected to initiate the largest IPO of a software firm ever, reaching a valuation of around 36 billion US Dollars according to Forbes (2021).

Databricks offers its customers an open-source AI platform to which BI is added, increasing efficiency and simplifying business analytics. You could see it as a data lake but on top of that a warehouse layer is built. This warehouse layer ensures quality control and provides a basis of BI to the data, which can then be used to report on future business changes. Because of the combination of AI and BI the data can be unstructured, semi-structured, or fully structured and still be useful for business decision-making.

Advantages include a reduction of costs as only one data platform has to be maintained, it is also reducing complexity as employers can draw from one data platform (Xplenty, 2021).

A disadvantage is however, as said before, data lake houses are still at the beginning of development. Therefore, the tool is not fully advanced and still lacks proper capacity to cater to all the needs of large corporate firms.

It will be an interesting development to follow in the years to come when data lake houses are fully adopted and innovated to be used by large internationals. This will certainly upgrade BI and its use in the business decision making process.

Please rate this

3 thoughts on “Building (data) lake houses”

  1. Interesting subject Verie, I had not heard about data lakehouses before. Curious to see how this technology will develop and mature over the coming years.

  2. Very interesting topic. Your descriptions of data lakes and warehouses make the topic easy to grasp. Working with different IT stakeholders in past, I found that many are building their own data lakes/warehouses from scratch, which take up a tremendous amount of time. Interestingly, organisation are building ‘separate’ warehouses. For example, I had a client who was building a data warehouse specifically for Human Resources (i.e. People Analytics), although data was widely available in other departments. While organisations often tried to centralise their data in one hub, especially large corporations struggled. One of the main reasons was simply their size. Making progress on digitisation meant that all departments and sub-units within were aiming to transform as fast as possible to keep up with competition, external pressures etc. The problem was that no organisation (at least none of the large corporates I came across) was able to role out all changes coherently across the organisation, not even with the help of costly outside consultants and the like. Often times, implementing solutions like SAP Success Factos and the like consumed the majority of the time for task-forces in charge of org-wide digitisation. Smaller digitisation efforts on the other hand were coordinated within departments. Hence, the building of siloed data lake houses. Interestingly, I never heard of Datarbricks. It seems like a solution that aims to address this very issue. Thanks for the insights! Great post.

    1. Hi Lennart, thank you for sharing the story of your customer. It is indeed quite inefficient for each department to try and build their own data ware house/ data lake. And trying to combine structured and unstructured data, so this is exactly the problem that Databricks is trying to solve. Thank you for the example!

Leave a Reply

Your email address will not be published. Required fields are marked *