I’m back from vacation … just to discover again the cruel reality:
"Once we enter the Expert Mind trap we think we know everything and are no longer open to learning new things."
The NetBeans IDE 6.5 Beta release
"think MDA should be enriched to what I define as MDE. In that case MDE and DSL's are complementary and necessary for a model-driven approach."
Oracle is delivering the Eclipse tools for Oracle WebLogic Server 10gR3 with a simpler packaging and free.
"The power of APM tools comes from showing individual aspects of applications and correlating these with business measurements."
"Everything breaks, regularly. With that as a baseline, now you can design your system to compensate for failures. Through this approach you can achieve new levels of availability."
"Do not believe in any single process or methodology because each works only in a particular and limited context."
VisualVM is a free opensource visual tool integrating several commandline JDK tools and lightweight performance and memory profiling capabilities.
XMPP PubSub slides
"This is a dictionary of algorithms, algorithmic techniques, data structures, archetypal problems, and related definitions."
Tim Bray on Rest
"Ms. Barnett positions the Jazz initiative within the software development landscape; shares best practices learned from early technology adopters, and identifies key benefits for practitioners, project managers and CIOs."
A recent post from Markus Voelter replying some articles from Martin Fowler has resulted in an excellent discussion about repositories for MDSD and textual vs graphical DSLs. I was thinking about joining the discussion in the original post, but as my answer is a bit long to be added as a comment, I’ll reply him here.
In his post, Markus raised a question that recurrently appears in my thoughts: where is it better to store models? as files in a VCS (like CVS/SVN) or as structured data in databases?
Unlike Markus, my first answer is usually: without any doubt, in databases. I suppose that this answer has much to do with my professional background, since in the past I have worked extensively with CASE tools. Yes, I believe I’m one of these people that M. Fowler describes in his post about MDSD:
The MDSD vision evolved from the development of graphical design notations and CASE tools. Proponents of these techniques saw graphical design notations as a way to raise the abstraction level above programming languages – thus improving development productivity. While these techniques and tools never caught on too far, the basic core ideas still live on and there is an ongoing community of people still developing them.
But also, it is because I’m used to working with very large repositories, where there are lots of relationships between models. If you work with lots of components but few relationships, you usually do not have problems working with local files (such as Eclipse does). VCSs are able to deal quite well large amount of files. But what happens when you have, for example, your entire transactional system modeled (> 30,000 components), with a very high index of reusability between models (lots of links), and each model can belong to a different owner (only the owner can modify it)? As Markus points out at the comments:
Also, text files tend to be a bit hard to scale. Often the minimum you need is some kind of “cross-indexer” via a database so you can efficiently cross-ref, search, etc. In a “real repository” that’s easier.
Consider the Xtext case. What do you do once you have hundreds of Xtext resources? Each linking into each other. How do you efficiently load, unload, search, find-refs, etc? You need some kind of (in memory or persistent) index.
Exactly. When you work with huge amount of data and links, you need to provide some kind of impact analysis or cross-reference functionalities, you must be able to do complex queries, you must version not only the components but also the relationships, and you must be able to link to other models without the need to download them locally. Yes, I know that there are some solutions out there that provide some of those facilities also for files, as text search engine libraries (Apache Lucene) and query languages (for example, XQuery for XML or SPARQL for RDF). But IMHO all of these solutions, although they work very well with few components, are not as powerful as what you can get *for free* using relational databases.
But I must also say that my opinion has changed somewhat over the years, due to my experience working with repositories. The approach of working with databases also has some problems. One of them is that in order to work with the tool you always must be connected to the database, and, although this situation sounds silly, this may limit the productivity of some developers. With VCS, you only need to be connected while you perform the checkout of the component, but after that, you can work locally with it. Another pain point is that in relational databases, you must to create a fixed schema (no matter if you use a metamodel, you always must create it), and that could be a mess when you need to modify the data structure, since RDBMS doesn’t provide schema versioning facilities. Fortunately, some new approaches has appeared in the market in the last years, as schema-free databases, that will help in this task. Another side effect is that if you want to preserve the integrity of the models and the relationships, you have to deal with locking mechanisms, so the scenario become worse, and usually, the system tends to be over-engineered. And finally, there are also some functionalities not provided by databases, as versioning, accountability (who, when and what) and in some cases traceability (why), so you must develop it by yourself (yay! we love to reinvent the wheel!). Almost all VCSs provide these facilities.
Let’s go back again to the original post. Markus talks about some conditions where repositories could fit well in this scenario:
My point is that a repository is not per se a bad thing, provided the following criteria: (1) you store all your relevant stuff in it (2) it provides versioning facilities (3) supports diff/merge on a meaningful abstraction level.
Ummm, I agree with almost everything, but I’ve some concerns:
- Not sure, if he talks about storing all of the data that belongs to a model in the repository, then I agree. But if he talks about storing the model and the code together, then I disagree. There are some scenarios where this is not convenient. For example, when you want to be platform independent (and I’m not talking about all the MDA stuff). The various parser/generator/interpret could run and store the code/binary on several platforms, and not always the same platform where you store the model.
- I agree, versioning is an essential facility.
- Diff/Merge works well with textual DSL’s, with a concrete syntax. But, although this is a great feature, is it mandatory? I have worked a long time without this feature and I assure you that you can survive without it. And what happens with graphical DSLs?
Before concluding, I would also like to comment one of the latest projects where we applied MDSD. At this project, we decided to use a Oracle XML DB to store our models in XML (something like to what Eurocontrol-CFMU have done for their UML models), but we added also some metadata. By storing the XML directly in the database, we avoid the need to decompose the XML into a relational schema, and allow developers to download the XML and work locally without the need to be connected to the database. We could use also all the SQL query facilities, and for those situations where the performance could be a problem, then we use the metadata to store some relevant data and relationships. Oh, and this RDBMS provides us also with versioning facilities. At this moment, we don’t have enough data in the repository to tell you if this approach will be a success or not. Let’s see!
To sum up (or not!). I believe you should never reject the database approach (nor the VCS option). I can not give you a “Golden Rule”, but my advice is that if you are not going to have lots of relationships between models, then use the VCS approach. If not, then analyze first what a VCS approach could offer you, and if it doesn’t fit well with your requirements, then use the database approach. But please, be careful and don’t tend to design the metamodel too much complicated, or you could have lots of performance problems.
As the post is quite long, I will leave for another post my thoughts about textual vs graphical DSLs. In the meantime, what is your opinion? I would love to hear stories from other folks on what people are doing in their companies.
Comment by Steven Kelly on 2008-08-07 16:02:52 +0000
Great to see people writing on this topic based on extensive experience of actual real world projects and tools! I’ve added my own 2c worth on the topic of RepositoryBasedCode in my blog – like you, I have over 15 years experience, so like you, it’s too long for a comment :-)!
Comment by Ferdy on 2008-08-08 01:20:33 +0000
Steven, glad to see you here.
I’ve read your post and I agree with you. I believe that working with a repository usually is easier than a VCS when multiple users are involved, but it also depends on the number of relationships between models. On the other hand, I believe that locking mechanisms can become very dangerous, so I definitely agree with your last sentence: “the best kind of conflict resolution is to avoid conflict in the first place!” 🙂
"They argue that with models and code generation everything gets automated and forget that especially modeling in a graphical tool takes more time than writing the same information into a text document."
Fundamentals of Software Architecture MindMap
TRIBALIZATION OF BUSINESS STUDY: How to Achieve Transformational Change through Communities and Social Networks
MDSD best practices
Curt Cotner explains why he believes Data Studio is not like AD/Cycle
The Pragmatic Programmers screencasts
The last week of June (as usual), the Eclipse Foundation delivered the new release of Eclipse, called Ganymede. This year the updated version is a coordinated release of 23 different projects and represents 18 MLOC. There are lots of articles and posts out there explaining the new features, so I’m not going to bore you with the rehashed details. I would just like to mention on two interesting features.
The first one is a really cool feature introduced in the Eclipse Communication Framework project that enables distributed teams to reap the benefits of pair programming. Based on a Google Summer of Code proposal, Mustafa Isik developed Real-Time Shared Editing, dubbed Cola (collaborate), a mechanism that allows two developers to work collaboratively in real-time to edit source code and/or documents. He has put together a short screencast showing the usage of this technology. Check it out! Digging further in this amazing feature, Mustafa pointed me to a Google Tech Talk he gave at EclipseDay at the Googleplex where he explained how this plugin resolves in real time any change conflict. The video is worth a visit. And if you want to add this feature to other editors (by default it has has been added to the JDT Java Source Code editor and Eclipse’s Default Text Editor), Scott Lewis has wrote some easy instructions … simply by adding a little bit of markup to plugin.xml.
The second one is the Usage Data Collector, a piece of technology that will generate statistics on how the various components of the Eclipse workbench (loaded bundles, commands and actions, perspective changes, view usage, …) are being used by developers. The Eclipse Foundation intent is to use this data to help committers and organizations better understand how developers are using Eclipse, in order to improve the overall user experience. Privacy must not be a problem, as this feature is opt-in (there is an option on the “Usage Data Collector” preferences page labeled “Enable Capture”) and it is completely anonymous. Although the data collected is not quite representative, you can see right now some statistics (I see lots of Cut-and-Paste Programming). I hope that these statistics will be public and the Eclipse Foundation will publish some reports regularly (I have not seen any notice about this). But besides the benefits that these statistics may have for the Eclipse Foundation, I believe they can also be attractive to some organizations which have developed internal plugins. And I say this from my own experience. One of the problems we had in the past was how to measure the use of the different plugins we developed, and also, which was the response time (we had several complains about the client performance). We finally had to create an infrastructure in order to collect and analyze these data. So, I see with interest the possibility of extending the official UDC API (both, listeners and monitors). Let’s see how it evolves in the future.