links for 2008-08-18

Repositories for Models: VCS or Databases?

A recent post from Markus Voelter replying some articles from Martin Fowler has resulted in an excellent discussion about repositories for MDSD and textual vs graphical DSLs. I was thinking about joining the discussion in the original post, but as my answer is a bit long to be added as a comment, I’ll reply him here.

In his post, Markus raised a question that recurrently appears in my thoughts: where is it better to store models? as files in a VCS (like CVS/SVN) or as structured data in databases?

Unlike Markus, my first answer is usually: without any doubt, in databases. I suppose that this answer has much to do with my professional background, since in the past I have worked extensively with CASE tools. Yes, I believe I’m one of these people that M. Fowler describes in his post about MDSD:

The MDSD vision evolved from the development of graphical design notations and CASE tools. Proponents of these techniques saw graphical design notations as a way to raise the abstraction level above programming languages - thus improving development productivity. While these techniques and tools never caught on too far, the basic core ideas still live on and there is an ongoing community of people still developing them.

But also, it is because I’m used to working with very large repositories, where there are lots of relationships between models. If you work with lots of components but few relationships, you usually do not have problems working with local files (such as Eclipse does). VCSs are able to deal quite well large amount of files. But what happens when you have, for example, your entire transactional system modeled (> 30,000 components), with a very high index of reusability between models (lots of links), and each model can belong to a different owner (only the owner can modify it)? As Markus points out at the comments:

Also, text files tend to be a bit hard to scale. Often the minimum you need is some kind of “cross-indexer” via a database so you can efficiently cross-ref, search, etc. In a “real repository” that’s easier.

Consider the Xtext case. What do you do once you have hundreds of Xtext resources? Each linking into each other. How do you efficiently load, unload, search, find-refs, etc? You need some kind of (in memory or persistent) index.

Exactly. When you work with huge amount of data and links, you need to provide some kind of impact analysis or cross-reference functionalities, you must be able to do complex queries, you must version not only the components but also the relationships, and you must be able to link to other models without the need to download them locally. Yes, I know that there are some solutions out there that provide some of those facilities also for files, as text search engine libraries (Apache Lucene) and query languages (for example, XQuery for XML or SPARQL for RDF). But IMHO all of these solutions, although they work very well with few components, are not as powerful as what you can get *for free* using relational databases.

But I must also say that my opinion has changed somewhat over the years, due to my experience working with repositories. The approach of working with databases also has some problems. One of them is that in order to work with the tool you always must be connected to the database, and, although this situation sounds silly, this may limit the productivity of some developers. With VCS, you only need to be connected while you perform the checkout of the component, but after that, you can work locally with it. Another pain point is that in relational databases, you must to create a fixed schema (no matter if you use a metamodel, you always must create it), and that could be a mess when you need to modify the data structure, since RDBMS doesn’t provide schema versioning facilities. Fortunately, some new approaches has appeared in the market in the last years, as schema-free databases, that will help in this task. Another side effect is that if you want to preserve the integrity of the models and the relationships, you have to deal with locking mechanisms, so the scenario become worse, and usually, the system tends to be over-engineered. And finally, there are also some functionalities not provided by databases, as versioning, accountability (who, when and what) and in some cases traceability (why), so you must develop it by yourself (yay! we love to reinvent the wheel!). Almost all VCSs provide these facilities.

Let’s go back again to the original post. Markus talks about some conditions where repositories could fit well in this scenario:

My point is that a repository is not per se a bad thing, provided the following criteria: (1) you store all your relevant stuff in it (2) it provides versioning facilities (3) supports diff/merge on a meaningful abstraction level.

Ummm, I agree with almost everything, but I’ve some concerns:

  1. Not sure, if he talks about storing all of the data that belongs to a model in the repository, then I agree. But if he talks about storing the model and the code together, then I disagree. There are some scenarios where this is not convenient. For example, when you want to be platform independent (and I’m not talking about all the MDA stuff). The various parser/generator/interpret could run and store the code/binary on several platforms, and not always the same platform where you store the model.
  2. I agree, versioning is an essential facility.
  3. Diff/Merge works well with textual DSL’s, with a concrete syntax. But, although this is a great feature, is it mandatory? I have worked a long time without this feature and I assure you that you can survive without it. And what happens with graphical DSLs?

Before concluding, I would also like to comment one of the latest projects where we applied MDSD. At this project, we decided to use a Oracle XML DB to store our models in XML (something like to what Eurocontrol-CFMU have done for their UML models), but we added also some metadata. By storing the XML directly in the database, we avoid the need to decompose the XML into a relational schema, and allow developers to download the XML and work locally without the need to be connected to the database. We could use also all the SQL query facilities, and for those situations where the performance could be a problem, then we use the metadata to store some relevant data and relationships. Oh, and this RDBMS provides us also with versioning facilities. At this moment, we don’t have enough data in the repository to tell you if this approach will be a success or not. Let’s see!

To sum up (or not!). I believe you should never reject the database approach (nor the VCS option). I can not give you a “Golden Rule”, but my advice is that if you are not going to have lots of relationships between models, then use the VCS approach. If not, then analyze first what a VCS approach could offer you, and if it doesn’t fit well with your requirements, then use the database approach. But please, be careful and don’t tend to design the metamodel too much complicated, or you could have lots of performance problems.

As the post is quite long, I will leave for another post my thoughts about textual vs graphical DSLs. In the meantime, what is your opinion? I would love to hear stories from other folks on what people are doing in their companies.

links for 2008-07-31 [delicious.com]

Eclipse Ganymede hidden treasures

The last week of June (as usual), the Eclipse Foundation delivered the new release of Eclipse, called Ganymede. This year the updated version is a coordinated release of 23 different projects and represents 18 MLOC. There are lots of articles and posts out there explaining the new features, so I’m not going to bore you with the rehashed details. I would just like to mention on two interesting features.

The first one is a really cool feature introduced in the Eclipse Communication Framework project that enables distributed teams to reap the benefits of pair programming. Based on a Google Summer of Code proposal, Mustafa Isik developed Real-Time Shared Editing, dubbed Cola (collaborate), a mechanism that allows two developers to work collaboratively in real-time to edit source code and/or documents. He has put together a short screencast showing the usage of this technology. Check it out! Digging further in this amazing feature, Mustafa pointed me to a Google Tech Talk he gave at EclipseDay at the Googleplex where he explained how this plugin resolves in real time any change conflict. The video is worth a visit. And if you want to add this feature to other editors (by default it has has been added to the JDT Java Source Code editor and Eclipse’s Default Text Editor), Scott Lewis has wrote some easy instructions … simply by adding a little bit of markup to plugin.xml.

The second one is the Usage Data Collector, a piece of technology that will generate statistics on how the various components of the Eclipse workbench (loaded bundles, commands and actions, perspective changes, view usage, …) are being used by developers. The Eclipse Foundation intent is to use this data to help committers and organizations better understand how developers are using Eclipse, in order to improve the overall user experience. Privacy must not be a problem, as this feature is opt-in (there is an option on the “Usage Data Collector” preferences page labeled “Enable Capture”) and it is completely anonymous. Although the data collected is not quite representative, you can see right now some statistics (I see lots of Cut-and-Paste Programming). I hope that these statistics will be public and the Eclipse Foundation will publish some reports regularly (I have not seen any notice about this). But besides the benefits that these statistics may have for the Eclipse Foundation, I believe they can also be attractive to some organizations which have developed internal plugins. And I say this from my own experience. One of the problems we had in the past was how to measure the use of the different plugins we developed, and also, which was the response time (we had several complains about the client performance). We finally had to create an infrastructure in order to collect and analyze these data. So, I see with interest the possibility of extending the official UDC API (both, listeners and monitors). Let’s see how it evolves in the future.

links for 2008-07-29

links for 2008-07-23

links for 2008-07-17

links for 2008-07-16

RSDC 2008 Summary

After almost a month since RSDC 2008, there is not much more to add to what already has appeared in the press. Mike MacDonagh, from Ivar Jacobson International, has also wrote a great post covering all the 22 20 9+11 product announcements and some specific posts covering in detail RQM and RRC. And Coté, from Redmonk, has published some nice video interviews. So … nothing more to add … except some silly thoughts, some photos of the playful part of the event and some curious facts I discovered just for the record.

Thoughts:

Good news: The Jazz Platform is here! Bad news: The Jazz Platform is here!

Jazz is the next generation platform for Rational products, which goal is to be for collaboration tools what Eclipse is for the desktop. With this platform, there will be a better integration between tools and, most important, a better integration in the application lifecycle. For those of us who started our career using or developing CASE tools, this concept is not new (BTW, I was introduced to this world using Softlab Maestro II). But after the failure of traditional CASE tools, only a few vendors continued to develop integrated solutions for the entire application lifecycle. Now it seems that Rational is changing its strategy and is going to embrace ALM 2.0, and this represents good news for Rational customers.

But I also think that hard times are coming for existing Rational customers. The Jazz Platform will change most of the Rational portfolio, so expect a new wave of product releases based on Jazz in the next few years (I heard something about a 10 years plan, but I can not confirm this). During the conference, I talked with several customers and I appreciate lots of excitement on them but also some worries, mainly because nowadays there are some not answered questions about which will be the future of some products (same issue for Telelogic customers). For some large IT shops with huge investment in Rational products the transition could be traumatic, specially if Rational doesn’t plan very well the roadmap.

About open-sourcing Jazz, now it is clear to me that the Jazz core infrastructure is not going to be released as an open source project, at least in a short-medium term. There have been some inaccuracies in the press, partly fueled by some statements at the RSDC 2006. At this point, I want to thank Dave Thomson, from whom I learned a lot and who kindly discussed with me (without any complaint) about OCD and open source in 3 different occasions. I think I earned the “terrible pain” title.

Instead, they announced the Open Services for Lifecycle Collaboration initiative, an integration architecture for tools and software development processes. Quoting directly from the Open Services FAQ: “Our goal is to enable teams to use disparate tools and share lifecycle resources in delivering software, whether the tools are from IBM, other vendors, open source projects, or in-house development. We aim to do so in a way that is open and non-proprietary and that will encourage all industry members to participate“. Ummm, I’m sure I’ve heard this statement in the past, and it is not a Déjà vu. Luckily, Martin Nally, CTO of Rational, clarified this statement telling us that this is not an AD/Cycle resurgence. But I’m not really sure, for me it’s the same concept as AD/Cycle but out of the mainframe, without a central repository / data model and using some modern protocols aimed to break the usual vendor silos. Let’s see if this time the effort will reach the necessary consensus. And talking about Open Services, I also want to thank two really brilliant guys, Pat Mueller and Simon Johnston. They explained me in detail JRS, the embryo for Open Services.

Related to the Jazz development process, Bill Higgins was kindly pleasant to introduce me most of the Jazz team leads, some of them I follow on twitter. What I discovered talking with them is that there is lots of innovation and experimentation in the Jazz development process. BTW, Bill and Erich Gamma also suffered stoically my not so innovative presentation about my company and which are our plans for the application development tools.

And finally, I also want to mention Mainsoft, one of the Rational business partners. Philippe and Jenna spent some time with me introducing a great Microsoft SPS Jazz integration. Thanks for your time! As I told to Danielle, it was really interesting.

Special events:

On Monday, there was the Telelogic Welcome Celebration event, featuring The Wallflowers. To be honest, it was the first time I heard about this band. They sound great, but definitively it’s not my preferred music style. Anyway, I listen to the concert next to Kelly, so it was a very enjoyable evening.

The Wallflowers
The Wallflowers, originally uploaded by Ferran Rodenas.

On Wednesday, there was the Universal Studios special event. I found the Fear Factor Live show really disgusting, but the Revenge of the Mummy was really funny. And quoting Mitch Fatel, here it is how Telelogic was acquired by IBM/Rational shark:

Universal Studios - Jaws
Universal Studios - Jaws, originally uploaded by Ferran Rodenas.

And finally, some curious facts that I discovered during the conference:

  • Orlando is really hot, but inside hotels you could freeze to death, they are really really cold. This is something I’ve observed in several US conferences.
  • Smoker’s corner and Dolphin Bar are great places to meet new friends. Bad habits that helps to socialize you.
  • Despite the general belief in this part of the pond, US people love Europe. I talked to several people that knows very well Spain, and someone asked me about tourist routes by bike in the south of Spain.
  • Boston and Canadian accent is easier to understand than North Carolina accent. Sorry guys, but I had to pay close attention in order to understand you!
  • When you spend all the day listening and speaking a language which you are not very fluent in, you usually finish the day with a big headache. Anyway, this time I felt myself more fluent than in previous occasions, and I believe Twitter has something to do.
  • Dolphin fountain is not only aimed for decoration purposes. You can swim on it. I saw more than 10 people inside!

dolphin fountain
dolphin fountain, originally uploaded by kellypuffs.

Definitively, it was a great experience! Hope to be there next year.

Visual Studio will feature UML support

As I wrote in a previous post, one of the main problems I saw in Microsoft Visual Studio DSL Tools was the lack of support for UML. I’m not a big fan of UML, but I must recognize that a common modeling language could be helpful in some scenarios.

Now, it seems that Microsoft changed it’s view. Bill Gates announced at Tech·Ed 2008 for Developers that UML will be part of Visual Studio 10. Great news. But this announce does not means that Microsoft is moving away from DSL. As Cameron Skinner wrote in a post, Microsoft will be using an hybrid model, a combination of both approaches: UML at the “logical” layer and DSLs at the “physical” layer. Not as powerful as openArchitectureWare, but a great step forward.