Data Lakes | Application Consulting

Data Lake (DL) is a concept for integrating traditional transactional data with unstructured data from external data sources. DL handle massive amounts of high-velocity data via parallel implementations of analytical functions:

Prescriptive Analytics

Prescriptive Analytics is the next evolutionary step in Business Intelligence (BI).

For 20+ years BI systems have been built on Data Warehouses (DWH). DWH became a necessity in the 1980's to overcome the data integration problems of its underlying ERP (or OLTP) systems. However, over the course of time these analytical BI applications themselves have become data silos. It was just easier to create new objects for the same information items as developers felt a need for being independent of existing infrastructures.
As BI systems have matured more and more decision makers are looking to automate recurring decisions like space, time, personnel or financial allocations. Consequently these new automated decisions need to be managed consistently as they gather information from various analytical apps. Once consistency is achieved the functional integration of those apps is no longer left to the user. Instead predictive algorithms can determine the future value of a decision and act upon fixed rules (anticipatory analytics).

BIPortal™ develops Prescriptive Analytics solutions that integrate within the Enterprise Decision Management (EDM) system.

Predictive Modeling

Predictive Modeling (PM) develops pattern recognition and classification algorithms such that future events can be assigned a certain probability. It draws on long-term trends from the past to forecast likely causalities of future events and combines it with current news to predict the immediate future.

Whereas forecasts rely on huge amounts of data, the actual prediction is based on patterns and the current situation. Time constraints and the complexity of the predictive model usually require in-memory databases — their velocity is combined with the ability of distributed file systems like HDFS to handle the volume and variety of external data sources.

Predictive models can be exchanged across applications supporting the Predictive Model Markup Language (PMML) — BIPortal uses GNU R, SAS, SAP Predictive Analytics, and Apache MLib.

Predictive models use a wide range of methods called Data Science.

Information Management

Information Management (IM) entails organizing, retrieving, acquiring, securing and maintaining corporate information.

Typically the value of an efficient Information System (IS) increases with its integration. That is the identification of all relationships of its underlying data objects. A structured data model creates and implies context and meaning of the data. The better an information model reflects reality the more queries can be automated and the less interpretation effort is required by its users.

In a DL setting it is no longer possible to define a static relational model as it used to be done in BI projects. Instead schema-less distributed file systems handle the ever-changing data structures. Contextualisation is therfore moved to the knowledge level by translating ontologies. They translate the central queries into implementation-specific languages such that computational benefits can be retained.

Data Unification

Data Unification (DU) deals with the alignment of transactional and master data objects. Unstructured data is parsed, scraped and mined for contents such that it becomes actionable. Typical data unifications include but are not limited to

• integration of transaction keys
• formatting of data objects
• semantic integration of master data objects

The alignment of transaction keys is a primary task in Enterprise Data Warehousing (EDW). An EDW implements a consistent data repository and therefore is seen as the corporation's single point of truth. In contrast to the EDW a Data Lake (DL) holds multiple points of truth of various degrees of veracity.

Master data unification often requires some form of semantic intelligence and thus is usually semi-automated through Master Data Management (MDM) systems.

User Productivity

User productivity (UP) can be increased by users finding relevant information quickly and also by being able to share departmental data effortlessly through intuitive User Interfaces (UI).

Information access and retrieval used to be key to Business Intelligence (BI) systems. However, with the advent of relational databases (RDBMS) it grew into its own application area (IR) and is now returning to BI via NewSQL databases.

In contrast to master data (MD) user data (UD) doesn't originate from a single central system. Instead it is owned by single users often complementing master data by situation-dependent information. A typical application area is planning where the master data (e.g. a planned new component) does not yet exist in the OLTP systems. Also qualitative information like risk perceptions is often supplied as UD.

BIPortal deploys design thinking techniques to optimize the user experience (UX) and transforms it into a technical specification. This process ensures the analytical application's fit for the business and includes an efficient change managment (CM).

Application Management

Application Management (AM) ensures the system's availablity throughout its life-cycle.

AM entails Project Planning, IT Governance, Team Leadership, Requirements Engineering, Service Level Management, Quality Audits, and Training.

In a DL setting the level of control an organization needs to exercise over its systems determines its hosting requirements.

Software as a Service (SaaS) providers like Big Query offer a DL solution for immediate use.
Platform as a Service (PaaS) providers like AWS or HCP offer a scalable platform for users to build apps.
Infrastructure as a Service (IaaS) providers like Rackspace offer scalable servers to be equipped with software by the users.
Private cloud solutions like OpenStack are fully controlled by its owners.

BIPortal's Application Management is complemented by support after the realization phase.