Technological Thoughts by Jerome Kehrli

Entries tagged [architecture]

Modern Information System Architectures

by Jerome Kehrli

Posted on Monday Dec 13, 2021 at 12:04PM in Big Data

For forty years we have been building Information Systems in corporations in the same way, with the same architecture, with very little innovations and changes in paradigms:

  • On one side the Operational Information System which sustains day-to-day operations and business activities. On the Operational Information Systems, the 3-tiers architecture and the relational database model (RDBMS - Relational Database Management System / SQL) have ruled for nearly 40 years.
  • On the other side the Decision Support Information System - or Business Intelligence or Analytical Information System - where the Data Warehouse architecture pattern has ruled for 30 years.

legacy Information System Architecture
legacy / Information Systems Architecture for 40 years

Of course the technologies involved in building these systems have evolved in all these decades, in the 80s COBOL on IBM hosts used to rule the Information Systems world whereas Java emerged quickly as a standard in the 2000s, etc.
But while the technologies used in building these information systems evolved fast, their architecture in the other hand, the way we design and build them, didn't change at all. The relational model ruled for 40 years along the 3-tiers model in the Operational world and in the analytical world, the Data Warehouse pattern was the only way to go for decades.

The relational model is interesting and has been helpful for many decades. its fundamental objective is to optimize storage space by ensuring an entity is stored only once (3rd normal form / normalization). It comes from a time when storage was very expensive.
But then, by imposing normalization and ACID transactions, it prevents horizontal scalability by design. An Oracle database for instance is designed to run on a single machine, it simply can't implement relational references and ACID transactions on a cluster of nodes.
Today storage is everything but expensive but Information Systems still have to deal with RDBMS limitations mostly because ... that's the only way we used to know.

On the Decision Support Information System (BI / Analytical System), the situation is even worst. in Data warehouses, data is pushed along the way and transformed, one step at a time, first in a staging database, then in the Data Warehouse Database and finally in Data Marts, highly specialized towards specific use cases.
For a long time we didn't have much of a choice since implementing such analytics in a pull way (data lake pattern) was impossible, we simply didn't have the proper technology. The only way to support high volumes of data was to push daily increments through these complex transformation steps every night, when the workload on the system is lower.
The problem with this push approach is that it's utmost inflexible. One can't change his mind along the way and quickly come up with a new type of data. Working with daily increments would require waiting 6 months to have a 6 months history. Not to mention that the whole process is amazingly costly to develop, maintain and operate.

So for a long time, RDBMSes and Data Warehouses were all we had.

It took the Internet revolution and the web giants facing limits of these traditional architectures for finally something different to be considered. The Big Data revolution has been the cornerstone of all the evolutions in Information System architecture we have been witnessing over the last 15 years.

The latest evolution in this software architecture evolution (or revolution) would be micro-services, where finally all the benefits that were originally really fit to the analytical information system evolution finally end up overflowing to the operational information system.
Where Big Data was originally a lot about scaling the computing along with the data topology - bringing the code to where the data is (data tier revolution) - we're today scaling everything, from individual components requiring heavy processing to message queues, etc.

Microservices Architecture
Example of modern IS architecture: Microservices

In this article, I would want to present and discuss how Information System architectures evolved from the universal 3 tiers (operational) / Data Warehouse (analytical) approach to the Micro-services architecture, covering Hadoop, NoSQL, Data Lakes, Lambda architecture, etc. and introducing all the fundamental concepts along the way.

Read More

Powerful Big Data analytics platform fights financial crime in real time

by Jerome Kehrli

Posted on Friday Sep 03, 2021 at 11:17AM in Big Data

(Article initially published on NetGuardians' blog)

NetGuardians overcomes the problems of analyzing billions of pieces of data in real time with a unique combination of technologies to offer unbeatable fraud detection and efficient transaction monitoring without undermining the customer experience or the operational efficiency and security in an enterprise-ready solution.

When it comes to data analytics, the more data the better, right? Not so fast. That’s only true if you can crunch that data in a timely and cost-effective way.

This is the problem facing banks looking to Big Data technology to help them spot and stop fraudulent and/or non-compliant transactions. With a window of no more than a hundredth of a millisecond to assess a transaction and assign a risk score, banks need accurate and robust real-time analytics delivered at an affordable price. Furthermore, they need a scalable system that can score not one but many thousands of transactions within a few seconds and grow with the bank as the industry moves to real-time processing.

AML transaction monitoring might be simple on paper but making it effective and ensuring it doesn’t become a drag on operations has been a big ask. Using artificial intelligence to post-process and analyze alerts as they are thrown up is a game-changing paradigm, delivering a significant reduction in the operational cost of analyzing those alerts. But accurate fraud risk scoring is a much harder game. Some fraud mitigation solutions based on rules engines focus on what the fraudsters do, which entails an endless game of cat and mouse, staying up to date with their latest scams. By definition, this leaves the bank at least one step behind.

At NetGuardians, rather than try to keep up with the fraudsters, we focus on what we know and what changes very little – customers’ behavior and that of bank staff. By learning “normal” behavior, such as typical time of transaction, size, beneficiary, location, device, trades, etc., for each customer and internal user, and comparing each new transaction or activity against those of the past, we can give every transaction a risk score.

Read More

Lambda Architecture with Kafka, ElasticSearch and Spark (Streaming)

by Jerome Kehrli

Posted on Friday May 04, 2018 at 12:32PM in Big Data

The Lambda Architecture, first proposed by Nathan Marz, attempts to provide a combination of technologies that together provide the characteristics of a web-scale system that satisfies requirements for availability, maintainability, fault-tolerance and low-latency.

Quoting Wikipedia: "Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods.
This approach to architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data. The two view outputs may be joined before presentation.
The rise of lambda architecture is correlated with the growth of big data, real-time analytics, and the drive to mitigate the latencies of map-reduce.

In my current company - NetGuardians - we detect banking fraud using several techniques, among which real-time scoring of transactions to compute a risk score.
The deployment of Lambda Architecture has been a key evolution to help us evolve towards real-time scoring on the large scale.

In this article, I intend to present how we do Lambda Architecture in my company using Apache Kafka, ElasticSearch and Apache Spark with its extension Spark-Streaming, and what it brings to us.

Read More

Blockchain explained

by Jerome Kehrli

Posted on Friday Oct 07, 2016 at 12:01AM in Computer Science

I interested myself deeply in the blockchain topic recently and this is the first article of a coming whole serie around the blockchain.

This article presents an introduction on the blockchain, presents what it is in the light of its initial deployment in the Bitcoin project as well as all technical details and architecture concerns behind it.
We won't focus here on business applications aside from what is required to present the blockchain purpose, more concrete business applications and evolutions will be the topic of another post in the coming days / weeks.

This article presents and explains all the key techniques and mechanisms behind the blockchain technology.

The blockchain principles and fundamentals are really coming initially from the design work on the Bitcoin. Most of this article focuses on the design and the principle of the blockchain put in place in the Bitcoin system.
Some more recent (Blockchain 2.0) implementations differ slightly while still sharing most genes with the original blockchain, making all that is presented below valid from a conceptual perspective in these other implementations as well.

Read More