Bytecode manipulation with Javassist for fun and profit part I: Implementing a lightweight IoC container in 300 lines of code
by Jerome Kehrli
Posted on Monday Feb 13, 2017 at 09:30PM in Java
Java bytecode is the form of instructions that the JVM executes.
A Java programmer, normally, does not need to be aware of how Java bytecode works.
Understanding the bytecode, however, is essential to the areas of tooling and program analysis, where the applications can modify the bytecode to adjust the behavior according to the application's domain. Profilers, mocking tools, AOP, ORM frameworks, IoC Containers, boilerplate code generators, etc. require to understand Java bytecode thoroughly and come up with means of manipulating it at runtime.
Each and every of these advanced features of what is nowadays standard approaches when programming with Java require a sound understanding of the Java bytecode, not to mention completely new languages running on the JVM such as Scala or Clojure.
Bytecode manipulation is not easy though ... except with Javassist.
Of all the libraries and tools providing advanced bytecode manipulation features, Javassist is the easiest to use and the quickest to master. It takes a few minutes to every initiated Java developer to understand and be able to use Javassist efficiently. And mastering bytecode manipulation, opens a whole new world of approaches and possibilities.
The goal of this article is to present Javassist in the light of a concrete use case: the implementation in a little more than 300 lines of code of a lightweight, simple but cute IoC Container: SCIF - Simple and Cute IoC Framework.Read More
by Jerome Kehrli
Posted on Saturday Jan 28, 2017 at 11:05AM in Agile
A few years ago, I worked intensively on a pet project : AirXCell.
What was at first some framework and tool I had to write to work on my Master Thesis dedicated to Quantitative Research in finance, became after a few months somewhat my most essential focus in life.
Initially it was really intended to be only a tool providing me with a way to have a Graphical User Interface on top of all these smart calculations I was doing in R. After my master thesis, I surprised myself to continue to work on it, improving it a little here and a little there. I kept on doing that until the moment I figured I was dedicated several hours to it every day after my day job.
Pretty soon, I figured I was really holding an interesting software and I became convinced I could make something out of it and eventually, why not, start a company.
And of course I did it all wrong.
Instead of finding out first if there was a need and a market for it, and then what should I really build to answer this need, I spent hours every day and most of my week-ends developing it further towards what I was convinced was the minimum set of feature it should hold before I actually try to meet some potential customers to tell them about it.
So I did that for more than a year and a half until I came close to burn-out and send it all to hell.
Now the project hasn't evolve for three years. The thing is that I just don't want to hear about it anymore. I burnt myself and I am just disgusted about it. Honestly it is pretty likely that at the time of reading this article, the link above is not even reachable anymore.
When I think of the amount of time I
invested wasted in it, and the fact that even now, three years after, I still just don't want to hear anything about this project anymore, I feel so ashamed. Ashamed that I didn't take a leap backwards, read a few books about startup creation, and maybe, who knows, discover The Lean Startup movement before.
Even now, I still never met any potential customer, any market representative. Even worst: I'm still pretty convinced that there is a need and a market for such a tool. But I'll never know for sure.
Such stories, and even worst, stories of startups burning millions of dollars for nothing in the end, happen every day, still today.
Some years ago, Eric Ries, Steve Blank and others initiated The Lean Startup movement. The Lean Startup is a movement, an inspiration, a set of principles and practices that any entrepreneur initiating a startup would be well advised to follow.
Projecting myself into it, I think that if I had read Ries' book before, or even better Blank's book, I would maybe own my own company today, around AirXCell or another product, instead of being disgusted and honestly not considering it for the near future.
In addition to giving a pretty important set of principles when it comes to creating and running a startup, The Lean Startup also implies an extended set of Engineering practices, especially software engineering practices.
This article focuses on presenting and detailing these Software Engineering Practices from the Lean Startup Movement since, in the end, I believe they can benefit from any kind company, from initiating startup to well established companies with Software Development Activities.
By Software Engineering practices, I mean software development practices of course but not only. Engineering is also about analyzing the features to be implemented, understanding the customer need and building a successful product, not just writing code.
by Jerome Kehrli
Posted on Wednesday Jan 04, 2017 at 09:56PM in Agile
So ... I've read a lot of things recently on DevOps, a lot of very interesting things ... and, unfortunately, some pretty stupid as well. It seems a lot of people are increasingly considering that DevOps is resumed to mastering
puppet or docker containers. This really bothers me. DevOps is so much more than any tool such as puppet or docker.
This could even make me angry. DevOps seems to me so important. I've spent 15 years working in the engineering business for very big institutions, mostly big financial institutions. DevOps is a very key methodology bringing principles and practices that address precisely the biggest problem, the saddest factor of failure of software development projects in such institutions : the wall of confusion between developers and operators.
Don't get me wrong, in most of these big institutions being still far from a large and sound adoption of an Agile Development Methodology beyond some XP practices, there are many other reasons explaining the failure or slippage of software development projects.
But the wall of confusion is by far, in my opinion, the most frustrating, time consuming, and, well, quite stupid, problem they are facing.
So yeah... Instead of getting angry I figured I'd rather present here in a concrete and as precise as possible article what DevOps is and what it brings. Long story short, DevOps is not a set of tools. DevOps is a methodology proposing a set of principles and practices, period. The tools, or rather the toolchain - since the collection of tools supporting these practices can be quite extended - are only intended to support the practices.
In the end, these tools don't matter. The DevOps toolchains are today very different than they were two years ago and will be very different in two years. Again, this doesn't matter. What matters is a sound understanding of the principles and practices.
Presenting a specific toolchain is not the scope of this article, I won't mention any. There are many articles out there focusing on DevOps toolchains. I want here to take a leap backwards and present the principles and practices, their fundamental purpose since, in the end, this is what seems most important to me.
DevOps is a methodology capturing the practices adopted from the very start by the web giants who had a unique opportunity as well as a strong requirement to invent new ways of working due to the very nature of their business: the need to evolve their systems at an unprecedented pace as well as extend them and their business sometimes on a daily basis.
While DevOps makes obviously a critical sense for startups, I believe that the big corporations with large and old-fashioned IT departments are actually the ones that can benefit the most from adopting these principles and practices. I will try to explain why and how in this article.
by Jerome Kehrli
Posted on Tuesday Nov 22, 2016 at 09:29PM in Computer Science
Since Satoshi's White paper came online, other cryptocurrencies have proliferated on the market. But irrespective of the actual currency and the frequently debated deflation issues, the actual revolution here is the Blockchain protocol and the distributed computing architecture is supports.
Just as thirty years ago the open communications protocol created profitable business services by catapulting innovation, the blockchain protocol has the potential of being the same kind of breakthrough, by offering a just as disruptive foundation on which businesses start to emerge. Using the integrity lattice of the transactions, a whole suite of value trading innovations are beginning to enter the market.
The key innovation here are Smart Contracts. This relatively new concept involves the development of programs that can be entrusted with money.
Smart Contracts are autonomous computer programs that, once started, execute automatically and mandatorily the conditions defined beforehand, such as the facilitation, verification or enforcement of the negotiation or performance of a contract.
They are most of the time defined in a Programming Language, which in the case of the Ethereum Blockchain 2.0 technology form a Turing Complete Programming Language.
Smart Contracts are implemented as any other software program, using conditions, loops, function calls, etc.
If blockchains give us distributed trustworthy storage, then smart contracts give us distributed trustworthy computations.
To illustrate a possible use of smart contracts, let's take the example of travel insurance: finding that 60% of the passengers insured against the delay of their flight never claimed their money, a team created during a hackathon in London in 2015 an Automated Insurance system based on smart contracts.
With this service, passengers are automatically compensated when their flight is delayed, without having to fill out any form, and thus without the company having to process the requests. The blockchain's contribution here consists in generating the confidence and security necessary to automate the declarative phases without resorting to a third party.
The main advantage of putting Smart Contracts in a blockchain is the guarantee provided by the blockchain that the contract terms cannot be modified. The blockchain makes it impossible to tamper or hack the contract terms.
By developing ready to use programs that function on predetermined conditions between the supplier and the client, smart programs ensure a secure escrow service in real time at near zero marginal cost.
Smart Contracts enable to reduce the costs of verification, execution, arbitration and fraud prevention. They enable to overcome the moral hazard problem. The american cryptograph Nick Szabo is deemed to be the inventor of the concept, whom he spoked about in 1995 already. He used to mention the example of a rented car, whose smart contract could return the control to the owner in case the renter forgives the paiements.
Interestingly, as a sidenote, Nick Szabo is also believed by some to be one of the person behind the Satoshi Nakamoto identity.
In a general way, Smart Contracts from the heart of the Ethereum blockchain. Even if Rootstock aims at enabling the implementation of Smart Contracts on the bitcoin blockchain, the development of Smart Contracts technology is really rather related to Ethereum. The next versions of Ethereum are increasingly targeted to offering end users an App-Store-like User Experience to Smart Contracts.
This article intents to be a pretty complete introduction to Blockchain 2.0 technology and Smart Contract applications, detailing both of them as well as list the state of the state of the art of possible use cases being currently studied or discussed.
A big part of this article focuses on the Ethereum blockchain.
by Jerome Kehrli
Posted on Tuesday Nov 01, 2016 at 11:54PM in Web Devevelopment
Wikipedia's definition does a pretty great job in introducing comet:
In the early days of the World Wide Web, the browser would make multiple server requests: one for the page content and one for each page component. Examples of page components include images, CSS files, scripts, Java applets, and any other server-hosted resource referenced in the page.
One problem that Ajax did not adequately solve was the issue of data synchronization between the client and server. Since the browser would not know if something had changed on the server, Web applications typically polled the server on a periodic basis to ask if new information was available. The only possible way as to use Polling where the browser would poll the serve rat regular intervals to find out about new events and updated data.
To circumvent this very limitation, developers started to imagine techniques aimed at getting closer to server push, either using the Forever Hidden iframe technique or the Long Polling XMLHttpRequest technique. Both these techniques are grouped under the umbrella term Comet or Bayeux Protocol.
Now of course these techniques have respective advantages and drawbacks that I will be discussing later in this article.
The forever hidden iframe technique is the one I found most seducing for one very good and essential reason : it's the most robust one from a technical perspective. It has drawbacks of course in comparison with other techniques, but I still deem it the most solid one, and in some situations it was even the only one I could make work.
Having said that, I have to admit ... It blocks a freaking amount of threads in the java backend, it shows the annoying loading icon on the browser, managing errors is a nightmare ... but it always works, in every situation, and at the end of the day considering the kind of business critical applications I usually work on, this is what matters the most to me.
Now of course, WebSockets tend to rend all Comet tecnniques kind of legacy.
But still, I end up deploying comet techniques instead of WebSockets in many circumstances. You may wonder why ?
WebSockets are no magic silver bullet
- Most importantly, when we have to support plain HTTP connection and bad HTTP proxies which let a WebSocket being opened but don't let anything pass through
- Implementing WebSockets efficiently on the server side is more complicated than Comet and requires most of the time specific libraries, sometimes pretty incompatible with some Application Servers
- With WebSockets you are forced to run TCP proxies as opposed to HTTP proxies for load balancing
- When I have to support old version of Internet Explorer, such as IE9, in banking institutions that have a bad tendency to use pretty old version of software
- Many other reasons I am detailing below ...
The first problem above is the darkest one regarding WebSockets in my opinion. This proxy server issue is quite widespread.
Nonetheless, "You must have HTTPS" is the weakest argument against WebSocket adoption since I want all the web to be HTTPS, it is the future and it is getting cheaper every day. But unfortunately there are some context where we have to integrate our web application on plain HTTP and one should know that weird stuff will definitely happen you deploy WebSockets over unsecured HTTP.
lightweight, simple and robust Comet framework
This is the main rational behind the Lightweight and simple Comet Framework for Java that I am introducing here. I am using this framework, that I've designed long ago and somewhat maintained over the years, each and every time I encounter issues with WebSockets.
To be honest, I always consider it somewhat a failure since the standardization of the WebSocket specification should prevent me from reverting to this Comet Framework so often, but surprisingly it doesn't and I end up returning there pretty often.
Again, the forever hidden iframe Comet technique always works!
Anyway, this Lightweight and simple Comet Framework for Java is pretty handy and I am presenting it here and making the sources available for download.Read More
by Jerome Kehrli
Posted on Saturday Oct 22, 2016 at 02:06PM in OpenGL
I needed to have a little fun recently when slacking on my computer at night so, as is often the case in such situation,
I spent some time on my snake project again.
I implemented a few evolutions as explained below and am now working on making the auto-pilot algorithm a little better.
It's funny how this snake application is something on which I get over and over again years after years, trying to make it better, reworking it, etc.
The first post on this blog about this app was there :
Yeah, well, that doesn't make me younger, does it ?
The snake project
Snake is a little C++ / OpenGL / Real-Time project which shows a snake eating apples on a two dimensional board. It really is very much like the famous Nokia phone game except the snake finds its way on its own.
No nice textures, no sweet drawings yet, the world elements are mostly simple spheres. Trivial OpenGL features such as fog, lightning and shadows are implemented though.
The auto-pilot is still pretty stupid at the moment and the snake ends up eating itself quite fast.
This is the thing on which I am working now.
It still is a game as the "user" can at any moment deactivate the auto-pilot and take the control of the snake himself. At any moment, the auto-pilot can be re-engaged again or deactivated.
The project is open-source'd under GNU LGPL license and if you're interested in OpenGL, Vector Geometry, Real-time C++ Programming, or simply Algorithms you might very well enjoy having a look at the source code provided here.Read More
by Jerome Kehrli
Posted on Wednesday Oct 19, 2016 at 02:51PM in Agile
After almost two years as Head of R&D in my current company, I believe I succeeded in bringing Agility to Software Development here by mixing what I think makes most sense out of eXtreme Programing, Scrum, Kanban, DevOps practices, Lean Startup practices, etc.
I am strong advocate of Agility at every level and all the related practices as a whole, with a clear understanding of what can be their benefits. Leveraging on the initial practices already in place to transform the development team here into a state of the art Agile team has been - and still is - one of my most important initial objectives.
I gave myself two years initially to bring this transformation to the Software Development here. After 18 months, I believe we're almost at the end of the road and its a good time to take a step back and analyze the situation, trying to clarify what we do, how we do it, and more importantly why we do it.
As a matter of fact, we are working in a full Agile way in the Software Development Team here and we are having not only quite a great success with it but also a lot of pleasure.
I want to share here our development methodology, the philosophy and concepts behind it, the practices we have put in place as well as the tools we are using in a detailed and precise way.
I hope and believe our lessons learned can benefit others.
As a sidenote, and to be perfectly honest, while we may not be 100% already there in regards to some of the things I am presenting in this article, at least we have identified the gap and we're moving forward. At the end of the day, this is what matters the most to me.
This article presents all the concepts and practices regarding Agile Software Development that we have put (or are putting) in place in my current company and gives our secrete recipe which makes us successful, with both a great productivity / short lead time on one side and great pleasure and efficiency in our every day activities on the other side.Read More
by Jerome Kehrli
Posted on Saturday Oct 08, 2016 at 12:19AM in Computer Science
Ethical hacking is a very interesting field, and a pretty funny hobby. Well, ya all penetration tester out there, don't get me wrong: I am well aware that Penetration testing and Ethical Hacking is a full and challenging Software Engineering Field and an actual profession, don't get upset.
I am rather saying that studying vulnerabilities exploitation techniques in one's free time is pretty fun and intellectually rewarding. With the all time and everywhere connection of everything for all kind of usages (understand Internet of Things), current focus in the field of vulnerabilities exploitation is really on Web application, networks, distributed systems, etc.
In addition, most recent progresses in CPU-level protections and compiler-level protections have made local programs exploitation techniques somewhat outdated and such techniques are not very much presented or discussed anymore.
During my master studies, I followed an extended set of lectures on Ethical Hacking and Software Security in general and got quite interested in the field. I wrote a paper for a study in the context of the university at that time which I am reporting today in this blog.
The following article presents various classical vulnerabilities exploitation techniques on local programs.
by Jerome Kehrli
Posted on Friday Oct 07, 2016 at 12:01AM in Computer Science
I interested myself deeply in the blockchain topic recently and this is the first article of a coming whole serie around the blockchain.
This article presents an introduction on the blockchain, presents what it is in the light of its initial deployment in the Bitcoin project as well as all technical details and architecture concerns behind it.
We won't focus here on business applications aside from what is required to present the blockchain purpose, more concrete business applications and evolutions will be the topic of another post in the coming days / weeks.
This article presents and explains all the key techniques and mechanisms behind the blockchain technology.
The blockchain principles and fundamentals are really coming initially from the design work on the Bitcoin. Most of this article focuses on the design and the principle of the blockchain put in place in the Bitcoin system.
Some more recent (Blockchain 2.0) implementations differ slightly while still sharing most genes with the original blockchain, making all that is presented below valid from a conceptual perspective in these other implementations as well.
by Jerome Kehrli
Posted on Wednesday Oct 05, 2016 at 05:17PM in Computer Science
The blockchain and blockchain related topics are becoming increasingly discussed and studied. There is not one single day where I don't hear about it, that being on linkedin or elsewhere.
I kept myself busy on other topics these last years, mostly large scale information systems and analytic systems architecture in the finance business so I really missed the Bitcoin and blockchain hype.
I've been to an OCTO Technology event recently on the Blockchain. To be honest I went there more for the pleasure of seeing my former colleagues than for any specific interest on the topic. Yet I listen carefully to OCTO's presentation ... and I didn't imagine I would be so much intrigued and soon passionated by the subject.
I strongly believe the blockchain technology has the potential to be one of the most disruptive progress in computer sciences of these 10 last years. I studied and keep studying all the technical details, evolutions and business implications of this technology and will post various blog articles in the coming days / weeks about this topic:
- First article is : Blockchain explained. I am giving a clear explanation of all the technical nuts and bolts behind the blockchain technologies.
- Second is : Blockchain 2.0 - from bitcoin transactions to blockchain applications
- Third one will be : Blockchain - various business opportunities where I will discuss how the blockchain can (and will) disrupt several fields.
by Jerome Kehrli
Posted on Wednesday Oct 05, 2016 at 10:50AM in Big Data
Big Data technologies are increasingly used in retail banking institutions for customer profiling or other marketing activities. In private banking institutions, however, applications are less obvious and there are only very few initiatives.
Yet, as a matter of fact, there are opportunities in such institutions and they can be quite surprising.
Big Data technologies, initiated by the Web Giants such as Google or Amazon, enable to analyze very massive amount of data (ranging from Terabytes to Petabytes). Apache Hadoop is the de-facto standard nowadays when it comes to considering Open Source Big Data technologies but it is increasingly challenged by alternatives such as Apache Spark or others providing less constraining programming paradigms than Map-Reduce.
These Big Data Processing Platform benefits from the NoSQL genes : the CAP Theorem when it comes to storing data, the usage of commodity hardware, the capacity to scale-out (almost) linearly (instead of scaling up your Oracle DB) and a much lower TCO (Total Cost of Ownership) than standard architectures.
Most essential applications for such technologies in retail banking institutions consist in gathering knowledge and insights on the customer base, customer's profiles and their tendencies by using cutting-edge Machine Learning techniques on such data.
In contrary to retail banking institutions that are exploiting such technologies for many years, private banking institution, with their very low amount of transactions and their limited customer base are considering these technologies with a lot of skepticism and condescension.
However, in contrary to preconceived ideas, use case exist and present surprising opportunities, mostly around three topics :
- Enhance proximity with customers
- Improve investment advisory services
- Reduce computation costs
by Jerome Kehrli
Posted on Tuesday Aug 30, 2016 at 09:02AM in Computer Science
I have written a little Sudoku program for which I provide here the source code and Windows pre-built executables. Current version is Sudoku 0.2-beta.
It supports the following features:
- A GUI to display and manipulate a Sudoku board
- A Sudoku generator
- A Sudoku solver
- A Sudoku solving tutorial (quite limited at the moment)
At the moment there are two resolution methods supported, one using human-like
resolution techniques and a second using backtracking. The resolution of a solvable
Sudoku board takes a few milliseconds only.
A solvable Sudoku board is a Sudoku board than has one and only one solution.
The Sudoku board generator generates solvable Sudoku boards. It usually generates boards
between 18 and 22 pre-filled cells. (which is quite better than most generators I could
Currently it generates the best (i.e. most difficult) board it possibly can provided the random initial situation (with all cells filled) of the board.
The problem I'm facing in this current version is that it can take from a few seconds only up to several minutes to generate the board (this is discussed in the algorithms section below).
In addition, the difficulty of the resulting board is not tweakable at the moment. In some cases it generates an extremely difficult board (only solvable with nishio) in a few seconds while some other times it needs two minutes to generate an easy board.
The software is written in C++ on Linux with the help of the
wxWidgets GUI library.
It is cross-platform to the extent of the wxWidgets library, i.e. it can be compiled on Windows, OS X, BSD, etc.
A makefile is provided to build it, no configure scripts, i.e. you
need to adapt the makefile to your own Linux distribution (or on Windows Mingw, TDM, etc.).
Happily only few adaptations should be required.
by Jerome Kehrli
Posted on Thursday Aug 07, 2014 at 04:22PM in Computer Science
Data management comprises all the disciplines related to managing data as a valuable resource.
Data Management is the development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise.
During my MS studies, I followed two interesting lectures related to Data Management.
Data Mining is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. In the context of Data Mining, Data Warehouses (DW) form an important aspect. Data Warehouses generalize and consolidate data in multidimensional space. The construction of DW is an important pre-processing step for data mining involving data cleaning, data integration, data transformation.
I have summarized all the notes I have taken during the Introduction to Data Mining lecture as well as some of my solutions to the exercises within the following document : Summary of the Data Mining Lecture.
Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within a large collections (usually stored on computers). Information Retrieval is a field concerned with the structure, analysis, organisation, storage, searching and retrieval of information.
Here as well I have summarized the notes taken during the lecture within the following document : Summary of the Information Retrieval Lecture.
by Jerome Kehrli
Posted on Friday Dec 06, 2013 at 04:11PM in Computer Science
For some reasons that I'd rather keep private, I got interested in the kind of questions google, microsoft, amazon and other tech companies are asking to candidate during the recruitment process. Most of these questions are oriented towards algorithmics or mathematics. Some other are logic questions or puzzles the candidate is expected to be able to solve in a dozen of minutes in front of the interviewer.
If found various sites online providing lists of typical interview questions. Other sites are discussing topics like "the ten toughest questions asked by google" or by microsoft, etc.
Then I wondered how many of them I could answer on my own without help. The truth is that while I can answer most of these questions by myself, I still needed help for almost as much as half of them.
Anyway, I have collected my answers to a hundred of these questions below.
For the questions for which I needed some help to build an answer, I clearly indicate the source where I found it.
by Jerome Kehrli
Posted on Tuesday Mar 27, 2012 at 01:10AM in Computer Science
I want to share an interesting project that has appeared on the Web recently : the AirXCell project.
(As some of you already know, I am somewhat involved in this project :-)
AirXCell is an online R application framework currently supporting a programmable spreadsheet and an R development environment.
AirXCell is based on R - The GNU R Project for Statistical Computing. Current version is still somewhat limited yet fully functional.
Quoting the AirXCell User documentation :
AirXCell intents to revolution the world of spreadsheet applications and computational software by providing a product that:
- merges the world of spreadsheet application (e.g. Microsoft Excel, GNUmeric, etc.) and the world of computational software (e.g. Mathematica, Mathlab, etc.) and
- revolutions the usual approach in spreadsheet applications.