niceideas.ch
Technological Thoughts by Jerome Kehrli

Blockchain 2.0 - From Bitcoin Transactions to Smart Contract applications

by Jerome Kehrli


Posted on Tuesday Nov 22, 2016 at 09:29PM in Computer Science


Since Satoshi's White paper came online, other cryptocurrencies have proliferated on the market. But irrespective of the actual currency and the frequently debated deflation issues, the actual revolution here is the Blockchain protocol and the distributed computing architecture is supports.

Just as thirty years ago the open communications protocol created profitable business services by catapulting innovation, the blockchain protocol has the potential of being the same kind of breakthrough, by offering a just as disruptive foundation on which businesses start to emerge. Using the integrity lattice of the transactions, a whole suite of value trading innovations are beginning to enter the market.

The key innovation here are Smart Contracts. This relatively new concept involves the development of programs that can be entrusted with money.

Smart Contracts are autonomous computer programs that, once started, execute automatically and mandatorily the conditions defined beforehand, such as the facilitation, verification or enforcement of the negotiation or performance of a contract.
They are most of the time defined in a Programming Language, which in the case of the Ethereum Blockchain 2.0 technology form a Turing Complete Programming Language.
Smart Contracts are implemented as any other software program, using conditions, loops, function calls, etc.

If blockchains give us distributed trustworthy storage, then smart contracts give us distributed trustworthy computations.

To illustrate a possible use of smart contracts, let's take the example of travel insurance: finding that 60% of the passengers insured against the delay of their flight never claimed their money, a team created during a hackathon in London in 2015 an Automated Insurance system based on smart contracts.
With this service, passengers are automatically compensated when their flight is delayed, without having to fill out any form, and thus without the company having to process the requests. The blockchain's contribution here consists in generating the confidence and security necessary to automate the declarative phases without resorting to a third party.

The main advantage of putting Smart Contracts in a blockchain is the guarantee provided by the blockchain that the contract terms cannot be modified. The blockchain makes it impossible to tamper or hack the contract terms.
By developing ready to use programs that function on predetermined conditions between the supplier and the client, smart programs ensure a secure escrow service in real time at near zero marginal cost.

Smart Contracts enable to reduce the costs of verification, execution, arbitration and fraud prevention. They enable to overcome the moral hazard problem. The american cryptograph Nick Szabo is deemed to be the inventor of the concept, whom he spoked about in 1995 already. He used to mention the example of a rented car, whose smart contract could return the control to the owner in case the renter forgives the paiements.
Interestingly, as a sidenote, Nick Szabo is also believed by some to be one of the person behind the Satoshi Nakamoto identity.

In a general way, Smart Contracts from the heart of the Ethereum blockchain. Even if Rootstock aims at enabling the implementation of Smart Contracts on the bitcoin blockchain, the development of Smart Contracts technology is really rather related to Ethereum. The next versions of Ethereum are increasingly targeted to offering end users an App-Store-like User Experience to Smart Contracts.

This article intents to be a pretty complete introduction to Blockchain 2.0 technology and Smart Contract applications, detailing both of them as well as list the state of the state of the art of possible use cases being currently studied or discussed.
A big part of this article focuses on the Ethereum blockchain.

Read More

Comet: having fun with the Java HTTP Stack

by Jerome Kehrli


Posted on Tuesday Nov 01, 2016 at 11:54PM in Web Devevelopment


Wikipedia's definition does a pretty great job in introducing comet:

Comet is a web application model in which a long-held HTTP request allows a web server to push data to a browser, without the browser explicitly requesting it.

This says it all. For a long time, Comet - which is more an umbrella term regrouping several techniques - was pretty much the only way to get as close as possible to Server Push in Web application, plain HTML/Javascript applications on top of the HTTP protocol.
All methods have in common that they rely on browser-native technologies such as JavaScript, rather than on proprietary plug-ins.

History

In the early days of the World Wide Web, the browser would make multiple server requests: one for the page content and one for each page component. Examples of page components include images, CSS files, scripts, Java applets, and any other server-hosted resource referenced in the page.

Ajax - Asynchronous JavaScript and XML - went a long way towards making this model evolve. It allowed far greater control over page content by providing the ability to send server requests for as little, or as much data as the browser needed to update. In addition, its asynchronous nature supported multiple simultaneous calls - even while other elements downloaded.
One problem that Ajax did not adequately solve was the issue of data synchronization between the client and server. Since the browser would not know if something had changed on the server, Web applications typically polled the server on a periodic basis to ask if new information was available. The only possible way as to use Polling where the browser would poll the serve rat regular intervals to find out about new events and updated data.

To circumvent this very limitation, developers started to imagine techniques aimed at getting closer to server push, either using the Forever Hidden iframe technique or the Long Polling XMLHttpRequest technique. Both these techniques are grouped under the umbrella term Comet or Bayeux Protocol.
Now of course these techniques have respective advantages and drawbacks that I will be discussing later in this article.

Comet

For many reasons, mostly robustness and universality of the solution, I am favoring the Forever Hidden iframe approach and I have designed quite some time ago a Comet framework making use of this technique to provide Server-Push to Plain HTML/javascript web applications.

The forever hidden iframe technique is the one I found most seducing for one very good and essential reason : it's the most robust one from a technical perspective. It has drawbacks of course in comparison with other techniques, but I still deem it the most solid one, and in some situations it was even the only one I could make work.
Its principle is straightforward to understand: an iframe is opened on a server resource and the HTTP response stream is never closed by the backend. The iframe keeps loading more javascript instructions as the server pushes them in the response stream.
Having said that, I have to admit ... It blocks a freaking amount of threads in the java backend, it shows the annoying loading icon on the browser, managing errors is a nightmare ... but it always works, in every situation, and at the end of the day considering the kind of business critical applications I usually work on, this is what matters the most to me.

Now of course, WebSockets tend to rend all Comet tecnniques kind of legacy.
But still, I end up deploying comet techniques instead of WebSockets in many circumstances. You may wonder why ?

WebSockets are no magic silver bullet

  • Most importantly, when we have to support plain HTTP connection and bad HTTP proxies which let a WebSocket being opened but don't let anything pass through
  • Implementing WebSockets efficiently on the server side is more complicated than Comet and requires most of the time specific libraries, sometimes pretty incompatible with some Application Servers
  • With WebSockets you are forced to run TCP proxies as opposed to HTTP proxies for load balancing
  • When I have to support old version of Internet Explorer, such as IE9, in banking institutions that have a bad tendency to use pretty old version of software
  • Many other reasons I am detailing below ...

The first problem above is the darkest one regarding WebSockets in my opinion. This proxy server issue is quite widespread.
Nonetheless, "You must have HTTPS" is the weakest argument against WebSocket adoption since I want all the web to be HTTPS, it is the future and it is getting cheaper every day. But unfortunately there are some context where we have to integrate our web application on plain HTTP and one should know that weird stuff will definitely happen you deploy WebSockets over unsecured HTTP.

lightweight, simple and robust Comet framework

This is the main rational behind the Lightweight and simple Comet Framework for Java that I am introducing here. I am using this framework, that I've designed long ago and somewhat maintained over the years, each and every time I encounter issues with WebSockets.
To be honest, I always consider it somewhat a failure since the standardization of the WebSocket specification should prevent me from reverting to this Comet Framework so often, but surprisingly it doesn't and I end up returning there pretty often.
Again, the forever hidden iframe Comet technique always works!

Anyway, this Lightweight and simple Comet Framework for Java is pretty handy and I am presenting it here and making the sources available for download.

Read More

Snake ! Reloaded ...

by Jerome Kehrli


Posted on Saturday Oct 22, 2016 at 02:06PM in OpenGL


I needed to have a little fun recently when slacking on my computer at night so, as is often the case in such situation, I spent some time on my snake project again.
I implemented a few evolutions as explained below and am now working on making the auto-pilot algorithm a little better.
It's funny how this snake application is something on which I get over and over again years after years, trying to make it better, reworking it, etc.

The first post on this blog about this app was there : www.niceideas.ch/roller2/badtrash/entry/snake-tt-0-2-alpha
Yeah, well, that doesn't make me younger, does it ?

The snake project

Snake is a little C++ / OpenGL / Real-Time project which shows a snake eating apples on a two dimensional board. It really is very much like the famous Nokia phone game except the snake finds its way on its own.

No nice textures, no sweet drawings yet, the world elements are mostly simple spheres. Trivial OpenGL features such as fog, lightning and shadows are implemented though.

The auto-pilot is still pretty stupid at the moment and the snake ends up eating itself quite fast.
This is the thing on which I am working now.

Snake Application Screenshot

It still is a game as the "user" can at any moment deactivate the auto-pilot and take the control of the snake himself. At any moment, the auto-pilot can be re-engaged again or deactivated.

The project is open-source'd under GNU LGPL license and if you're interested in OpenGL, Vector Geometry, Real-time C++ Programming, or simply Algorithms you might very well enjoy having a look at the source code provided here.

Read More

Agile Software Development, lessons learned

by Jerome Kehrli


Posted on Wednesday Oct 19, 2016 at 02:51PM in Agile


After almost two years as Head of R&D in my current company, I believe I succeeded in bringing Agility to Software Development here by mixing what I think makes most sense out of eXtreme Programing, Scrum, Kanban, DevOps practices, Lean Startup practices, etc.
I am strong advocate of Agility at every level and all the related practices as a whole, with a clear understanding of what can be their benefits. Leveraging on the initial practices already in place to transform the development team here into a state of the art Agile team has been - and still is - one of my most important initial objectives.
I gave myself two years initially to bring this transformation to the Software Development here. After 18 months, I believe we're almost at the end of the road and its a good time to take a step back and analyze the situation, trying to clarify what we do, how we do it, and more importantly why we do it.

As a matter of fact, we are working in a full Agile way in the Software Development Team here and we are having not only quite a great success with it but also a lot of pleasure.
I want to share here our development methodology, the philosophy and concepts behind it, the practices we have put in place as well as the tools we are using in a detailed and precise way.
I hope and believe our lessons learned can benefit others.
As a sidenote, and to be perfectly honest, while we may not be 100% already there in regards to some of the things I am presenting in this article, at least we have identified the gap and we're moving forward. At the end of the day, this is what matters the most to me.

This article presents all the concepts and practices regarding Agile Software Development that we have put (or are putting) in place in my current company and gives our secrete recipe which makes us successful, with both a great productivity / short lead time on one side and great pleasure and efficiency in our every day activities on the other side.

Read More

Ethical hacking : a glimpse on local program vulnerabilities exploitation techniques

by Jerome Kehrli


Posted on Saturday Oct 08, 2016 at 12:19AM in Computer Science


Ethical hacking is a very interesting field, and a pretty funny hobby. Well, ya all penetration tester out there, don't get me wrong: I am well aware that Penetration testing and Ethical Hacking is a full and challenging Software Engineering Field and an actual profession, don't get upset.

I am rather saying that studying vulnerabilities exploitation techniques in one's free time is pretty fun and intellectually rewarding. With the all time and everywhere connection of everything for all kind of usages (understand Internet of Things), current focus in the field of vulnerabilities exploitation is really on Web application, networks, distributed systems, etc.
In addition, most recent progresses in CPU-level protections and compiler-level protections have made local programs exploitation techniques somewhat outdated and such techniques are not very much presented or discussed anymore.

During my master studies, I followed an extended set of lectures on Ethical Hacking and Software Security in general and got quite interested in the field. I wrote a paper for a study in the context of the university at that time which I am reporting today in this blog.
The following article presents various classical vulnerabilities exploitation techniques on local programs.

Read More

Blockchain explained

by Jerome Kehrli


Posted on Friday Oct 07, 2016 at 12:01AM in Computer Science


I interested myself deeply in the blockchain topic recently and this is the first article of a coming whole serie around the blockchain.

This article presents an introduction on the blockchain, presents what it is in the light of its initial deployment in the Bitcoin project as well as all technical details and architecture concerns behind it.
We won't focus here on business applications aside from what is required to present the blockchain purpose, more concrete business applications and evolutions will be the topic of another post in the coming days / weeks.

This article presents and explains all the key techniques and mechanisms behind the blockchain technology.

The blockchain principles and fundamentals are really coming initially from the design work on the Bitcoin. Most of this article focuses on the design and the principle of the blockchain put in place in the Bitcoin system.
Some more recent (Blockchain 2.0) implementations differ slightly while still sharing most genes with the original blockchain, making all that is presented below valid from a conceptual perspective in these other implementations as well.

Read More

The Blockchain ...

by Jerome Kehrli


Posted on Wednesday Oct 05, 2016 at 05:17PM in Computer Science


The blockchain and blockchain related topics are becoming increasingly discussed and studied. There is not one single day where I don't hear about it, that being on linkedin or elsewhere.

I kept myself busy on other topics these last years, mostly large scale information systems and analytic systems architecture in the finance business so I really missed the Bitcoin and blockchain hype.
I've been to an OCTO Technology event recently on the Blockchain. To be honest I went there more for the pleasure of seeing my former colleagues than for any specific interest on the topic. Yet I listen carefully to OCTO's presentation ... and I didn't imagine I would be so much intrigued and soon passionated by the subject.

I strongly believe the blockchain technology has the potential to be one of the most disruptive progress in computer sciences of these 10 last years. I studied and keep studying all the technical details, evolutions and business implications of this technology and will post various blog articles in the coming days / weeks about this topic:


Big Data and private banking, what for ?

by Jerome Kehrli


Posted on Wednesday Oct 05, 2016 at 10:50AM in Big Data


Big Data technologies are increasingly used in retail banking institutions for customer profiling or other marketing activities. In private banking institutions, however, applications are less obvious and there are only very few initiatives.
Yet, as a matter of fact, there are opportunities in such institutions and they can be quite surprising.

Big Data technologies, initiated by the Web Giants such as Google or Amazon, enable to analyze very massive amount of data (ranging from Terabytes to Petabytes). Apache Hadoop is the de-facto standard nowadays when it comes to considering Open Source Big Data technologies but it is increasingly challenged by alternatives such as Apache Spark or others providing less constraining programming paradigms than Map-Reduce.

These Big Data Processing Platform benefits from the NoSQL genes : the CAP Theorem when it comes to storing data, the usage of commodity hardware, the capacity to scale-out (almost) linearly (instead of scaling up your Oracle DB) and a much lower TCO (Total Cost of Ownership) than standard architectures.

Most essential applications for such technologies in retail banking institutions consist in gathering knowledge and insights on the customer base, customer's profiles and their tendencies by using cutting-edge Machine Learning techniques on such data.

In contrary to retail banking institutions that are exploiting such technologies for many years, private banking institution, with their very low amount of transactions and their limited customer base are considering these technologies with a lot of skepticism and condescension.

However, in contrary to preconceived ideas, use case exist and present surprising opportunities, mostly around three topics :

  • Enhance proximity with customers
  • Improve investment advisory services
  • Reduce computation costs

Read More

Sudoku Laboratoy

by Jerome Kehrli


Posted on Tuesday Aug 30, 2016 at 09:02AM in Computer Science


I have written a little Sudoku program for which I provide here the source code and Windows pre-built executables. Current version is Sudoku 0.2-beta.

It supports the following features:

  • A GUI to display and manipulate a Sudoku board
  • A Sudoku generator
  • A Sudoku solver
  • A Sudoku solving tutorial (quite limited at the moment)
Sudoku Laboratory - Screenshot 1 Sudoku Laboratory - Screenshot 2

At the moment there are two resolution methods supported, one using human-like resolution techniques and a second using backtracking. The resolution of a solvable Sudoku board takes a few milliseconds only.
A solvable Sudoku board is a Sudoku board than has one and only one solution.

The Sudoku board generator generates solvable Sudoku boards. It usually generates boards between 18 and 22 pre-filled cells. (which is quite better than most generators I could find online).
Currently it generates the best (i.e. most difficult) board it possibly can provided the random initial situation (with all cells filled) of the board.
The problem I'm facing in this current version is that it can take from a few seconds only up to several minutes to generate the board (this is discussed in the algorithms section below).
In addition, the difficulty of the resulting board is not tweakable at the moment. In some cases it generates an extremely difficult board (only solvable with nishio) in a few seconds while some other times it needs two minutes to generate an easy board.

The software is written in C++ on Linux with the help of the wxWidgets GUI library.
It is cross-platform to the extent of the wxWidgets library, i.e. it can be compiled on Windows, OS X, BSD, etc.

A makefile is provided to build it, no configure scripts, i.e. you need to adapt the makefile to your own Linux distribution (or on Windows Mingw, TDM, etc.).
Happily only few adaptations should be required.

Read More

Data Management

by Jerome Kehrli


Posted on Thursday Aug 07, 2014 at 04:22PM in Computer Science


Data management comprises all the disciplines related to managing data as a valuable resource.
Data Management is the development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise.

During my MS studies, I followed two interesting lectures related to Data Management.

  • Introduction to Data Mining (Summary)
  • Introduction to Information Retrieval (Summary)

Data Mining is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. In the context of Data Mining, Data Warehouses (DW) form an important aspect. Data Warehouses generalize and consolidate data in multidimensional space. The construction of DW is an important pre-processing step for data mining involving data cleaning, data integration, data transformation.
I have summarized all the notes I have taken during the Introduction to Data Mining lecture as well as some of my solutions to the exercises within the following document : Summary of the Data Mining Lecture.

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within a large collections (usually stored on computers). Information Retrieval is a field concerned with the structure, analysis, organisation, storage, searching and retrieval of information.
Here as well I have summarized the notes taken during the lecture within the following document : Summary of the Information Retrieval Lecture.


100 hard software engineering interview questions

by Jerome Kehrli


Posted on Friday Dec 06, 2013 at 04:11PM in Computer Science


For some reasons that I'd rather keep private, I got interested in the kind of questions google, microsoft, amazon and other tech companies are asking to candidate during the recruitment process. Most of these questions are oriented towards algorithmics or mathematics. Some other are logic questions or puzzles the candidate is expected to be able to solve in a dozen of minutes in front of the interviewer.

If found various sites online providing lists of typical interview questions. Other sites are discussing topics like "the ten toughest questions asked by google" or by microsoft, etc.
Then I wondered how many of them I could answer on my own without help. The truth is that while I can answer most of these questions by myself, I still needed help for almost as much as half of them.

Anyway, I have collected my answers to a hundred of these questions below.
For the questions for which I needed some help to build an answer, I clearly indicate the source where I found it.

Read More

Introduction to mathematical optimization

by Jerome Kehrli


Posted on Monday Mar 26, 2012 at 12:30AM in General


During my MSc studies, I followed an extended set of very interesting lectures related to Mathematical Optimization using basic mathematic concepts and simple algorithms such as the Newton (and/or Newton-based) methods or the simplex algorithm (and/or simplex based such as branch-and-bound, branch-and-cut, etc.).

Quoting Wikipedia:

"In the simplest case, an optimization problem consists of maximizing or minimizing a real function by systematically choosing input values from within an allowed set and computing the value of the function. The generalization of optimization theory and techniques to other formulations comprises a large area of applied mathematics.
More generally, optimization includes finding best available values of some objective function given a defined domain, including a variety of different types of objective functions and different types of domains.
"

I have these days a (very) little more than usual free time and I've compiled a resume of these lectures from my various notes and individual chapters resumes. So I decided to put this document online as it might help some of the future MSc students following any lecture related to Mathematical Optimization by providing them with an introduction to the field.

The resume is available here : resume_optim.pdf.


About linkedin, software architects and a little disappointment

by Jerome Kehrli


Posted on Sunday Oct 30, 2011 at 07:45PM in General


I am really amazed and astonished by a few updates I've been seeing on linkedin recently.

I've been working these ten last years with incredibly gifted people. You know, the kind of guys you discuss with wondering whether you yourself will ever be as good, clever and keen as them. I really think being that good is nothing to be ashamed of so let's assume I can name these guys. The very first one I remember is Thomas Beck (Geneva, Switzerland) . I've been working two years under his supervision (he was the software architect on our project) and I have learn more about the job discussing with him than I ever did reading whatever software architecture or design related book (agile, DDD, whatever). Happily I have learn a lot more since I left him yet I'm quite sure he did even more so I believe I'm still far from reaching his level of mastering of the software architecture business.
Other people I would also mention here are Sebastien Ursini, Sebastien Marc and Thomas Caprez (Geneva and Lausanne / Switzerland). I haven't seen these folks since several years for some of them yet I can still pretty clearly remember what they taught me and there's not one single day where I don't benefit from these teachings in my job.

On the other hand, just as everybody, I really had much more often the occasion to work with terrible software engineers. I principally encountered two categories.

The first one is this kind of people that went to great engineering schools or universities and assume the time they invest in their studies is well enough and exempts them from providing any little additional effort to keep learning since they graduated. These people are fools believing they're great only because of some piece of paper assessing they have once been able to learn something. I hope all my very good french colleagues won't hate me for this but I have to say that specifically french engineers are subject to this bad tendency.
Unfortunately, life doesn't make any gift to anyone and most of them are sooner or later taught the hard way how they're wrong and start kicking their buts to actually start learning the job and make some progress.

The second category is way more dangerous. This is the kind of people that sell themselves as software architects without any real software development experience. These folks read lots of books, follow lots of software architecture blogs and assume that this exempts them from building their own experience before claiming being software architects. I'm not saying reading is not good, but I am pretty sure that it is in no way comparable to experience. Unfortunately, due to poor recruitment processes one one side, and the lack of good software engineers on the market on the other side, these guys manage to find a software architect job and end up taking architecture-level decisions.

I am involved in the recruitment process in my current company (just as I was in my former companies). I take care of the technical assessment. I myself am usually a nice guy (well I think) and yet I show no mercy to candidates. I am pretty well aware that a mistake I make in this process might well lead me to work with bad engineers a few months later and this is a risk I'm not willing to take at all.
I am the guy killing those people. When I see someone coming in front of me with a resume claiming several years of experience in software architecture and not able to answer correctly the very first questions I'm asking him, it usually puts me in such a bad mood that I still keep the guy for the two hours that were planned and bury him 7 feet under ground. Hopefully the guy will work on a resume a little more humble before applying to another position (in another company, needless to say).
Just a word on "answering correctly": there is usually not only one good answer to a design problem or an architectural question, neither do I expect one. But I expect the candidate at least to build a proper conceptual model of the issue I'm presenting and to be able to outline a few solutions.

Now why am I putting all this online ?

Read More

Java - Create enum instances dynamically

by Jerome Kehrli


Posted on Saturday Nov 13, 2010 at 09:08PM in Java


I remember the introduction of the brand new enum type in Java 5 (1.5) was a very exciting announce. However, when I finally switched from 1.4 to 1.5 and actually tried Java's flavoured enum types, I was a bit disappointed.

Before that, I was using Josh Bloch's "Typesafe enum" pattern (effective java) for quite a long time and I didn't really see what was so much better with the new Java native enum construction. Ok, fine, there was the ability to use enum instances in switch - case statements which seemed fine, but what else ?

Besides, what I used to find great with the "typesafe enum" pattern is that it could be tricked and changed the way I wanted, for instance to be able to dynamically (at runtime) add enum instances to a specific typesafe enum class. I found it very disappointing not to be able to do the very same thing easily with the native Java enum construction.

And now you might wonder "Why the hell could one ever need to dynamically add enum values ?!?". You do, right ? Well, let's imagine this scenario:

You have a specific column in a DB table which contains various codes as values. There are more than hundred different codes actually in use in this column. Related to this, you have a business logic which performs different operations on the rows coming from this table, the actual kind of operation applied on the row depends on the value of this code. So there are chance you end up with a lot of if - elseif statements checking the actual value of the code.
I myself am allergic to using string comparison in conditions so I want to be able to map the values from this column to an enum type in Java. This way I can compare enum values instead of strings in my conditions and reduce my dependency on the format of the string value.

Now when there are more than a hundred different possible codes in the DB I really don't have any intent to define them all manually in my enum type. I want to define only the few I am actually using the Java code and let the system add the other ones dynamically, at runtime, when it (the ORM system or whatever I am using for reading the DB rows) encounters a new value from the DB.

Hence my need for dynamically added enum values.

So recently I faced this need once again and took a few hours to build a little solution which enables one to dynamically add values to a Java enum type. The solution is the following :

Read More

Java rocks !

by Jerome Kehrli


Posted on Wednesday Nov 03, 2010 at 08:40AM in Java


I've been facing an interesting problem with string manipulation in Java lately at work. The requirement was the following :

We have a field on some screen where the user can type in a comment. The comment can have any length the user wants, absolutely any. Should he want to type in a comment of a million characters, he should be able to do so.

Now the right way to store this comment in a database is using a CLOB, a BLOB or a LONGVARCHAR or whatever feature the database natively provides to do so. Unfortunately that's not the way it was designed. Due to legacy integration needs, all these advance DB types are prohibited within our application. So the way we have to store the comment consists of using several rows with a single comment field of a maximum length of 500 characters. That means the long comment has to be split in several sub-strings of 500 characters and each of them is stored in a separate row in the DB table. The table has a counter as part of the primary key which is incremented for each new row belonging to the same comment. This way we can easily spot every row part of the same comment.

Now another problem we have is that under DB2 a field defined as VARCHAR(500) can contain 500 bytes max even though the strings are encoded in UTF-8 in the database. That means we might not be able to store 500 characters if the string contains one or more 2 bytes UTF-8 characters. Working in a french environment, this happens a lot.
So we had to write a little algorithm taking care of the splitting of the string in 500 bytes sub-strings.

The very first version of our algorithm was quite stupid and ended up in splitting the string in a quite naive way: we converted the string to a byte array following an UTF-8 encoding and split the byte array instead of the string. Then each of the 500 bytes arrays was converted back to a string before being inserted in the database.
Happily, we figured out quite soon that this doesn't work as it ends up quite often splitting the string right in the middle of a 2 bytes character. The byte arrays being then converted back to strings, the split 2 bytes character was corrupted and could not be corrected any more.

Before writing as smarter version of the algorithm which would manually test the byte length of the character right at the position of the split, we took a leap backward and wondered : "Can it be that Java doesn't offer natively a simple way to do just that ?"

And the answer is yes of course.

Read More