May 07, 2013
Murray Cumming - May 07, 2013 - 13:49

I didn’t get around to blogging about gtkmm 3.8 when I released it last month, partly because we had to fix a crasher bug and do a gtkmm .1 release before people noticed.

Anyway, just a little while after GNOME 3.8 was released, we managed to release gtkmm 3.8 and glibmm 2.36. There is quite a bit of work in these, almost all by José Alburquerque and Kjell Ahlstedt, as well as the usual few days of last-minute work by me to wrap remaining new API.

I spend very little time on glibmm and gtkmm these days, and don’t have much motivation to change that.

There’s also a change that we expect in glib 2.38 that will break many installed applications that use gtkmm. We successfully begged the glib developers to add a special case for us in glib 2.36, but we have not found any way to avoid this for 2.38. So far our best option seems to be to do a new parallel-installed gtkmm (and glibmm) ABI, leaving the old (broken with glib 2.38) one behind, at least allowing applications to be changed slightly and then rebuilt.

Personally, I have no great incentive to go through that pain.

Jens Georg - May 07, 2013 - 09:55

While we were busy fixing the server and rendering side of DLNA with Rygel, the guys at Intel OTC are fixing the Client side of DLNA with something called dLeyna, a nice set of APIs to access and maipulate UPnP-AV and DLNA servers / renderers (such as Rygel, of course), so you can easily add DLNA support to your applications, including the obvious server browsing and render remote control, but also the more non-obvious like media pushing, synchronization, server-side playlists. They already prepared a cool set of demos (for example a Firefox extension to send images from your browser to your TV).

So why is this better than using GUPnP for this? Let me show you some examples.

Controlling a renderer

Not much code to see here, you get the usual suspects of player control functions such as start, stop, etc. as well as methods to query device’s capabilities as there are a lot of optional things on UPnP devices.

Uploading

Well, say you want to upload a file to a server. The code how to do that in GUPnP is  available in gupnp-tools and it’s not exactly pretty. With dLeyna, on the other hand, it’s a fewliner:

#!/usr/bin/env python
import sys
import mediaconsole as mc
 
u = mc.UPNP()
d = u.server_from_udn(sys.argv[1])
d.upload_to_any(sys.argv[2], sys.argv[3])

In DLNA land, this is called “+UP+”.

Playing a file

Or you want to show some media file you got on your device or app on a DLNA-capable TV? Korva is showing how you can do that with plain GUPnP, again with a lots of lines of code. dLeyna providing a nice and clean solution:

#!/usr/bin/env python
import sys
import rendererconosle as rc
m = rc.Manager()
d = m.renderer_from_udn(sys.argv[1])
uri = d.host_file(sys.argv[2])
d.stop()
d.open_uri(uri)
d.play()

And this is called “+PU+” in DLNA land.

Behind the scenes, this is all GUPnP of course. Currently it consists of two DBus services, dleyna-renderer-service and dleyna-server-service, although other IPC mechanisms are on its way. What happens is that that these two services scan the network for available devices and making them available through a set of DBus interfaces, relieving you from the need of searching for devices yourself (and with that providing a device cache, relieving the network from UDP packet bursts), introspecting the devices for supported capabilities and methods and so on.

If you execute the push script from above you get a python wrapper for the com.intel.dLeynaRenderer.Manager DBus interface, which is then locally looking for the DBus path matching the given UPnP UDN and returning a python object implementing the com.intel.dLeynaRenderer.PushHost and com.intel.dLeynaRenderer.RendererDevice interfaces.

Then we temporarily host the file given on the command-line on dLeyna’s internal HTTP server, stopping the currently running playback (Which translates to RenderingControl:Stop SOAP call), send the URI to the server (RenderingControl:SetAVTransportURI) and last but not least start the playback (RenderingControl:Play) which in the end starts the HTTP streaming from dleyna’s internal HTTP server to (Rygel’s) renderer.

And it doesn’t stop at the application level, there’s even integration with HTML5 through cloudeebus and cloud-dLeyna.

As a sidenote: You might ask how that relates to Grilo’s UPnP-AV support or Korva. This is a very valid question. Grilo and Korva are doing very specific tasks while dLeyna aims to be a more complete SDK. It should be quite easy, for example, to port Grilo’s UPnP-AV suppport to dLeyna.

April 26, 2013
Tristan Van Berkom - April 26, 2013 - 07:20

I’ve been meaning to write a short post showing what we’ve been able to do with Glade since we introduced composite widget templates in GTK+, the post will be as brief as possible since I’m preoccupied with other things but here’s a run over of what’s changed in the Dogg Food release.

Basically, after finally landing the composite template machinery (thanks to Openismus for giving me the time to do that), I couldn’t resist going the extra mile in Glade, over the weekends and such, to leverage the great new features and do some redesign work in Glade itself.

So please enjoy ! or don’t and yell very loudly about how you miss the old editor design, and make suggestions :)

Glade Preferences Dialog

Preferences Dialog Before

Preferences Dialog Before

Preferences Dialog After

Preferences Dialog After

 

 

 

 

 

 

 

The old preferences dialog was a sort of lazy combo box, now that we have composite templates and create the interface using GtkBuilder, it was pretty easy to add the treeview and create a nicer interface for adding customized catalog paths.

Also there are some new features and configurations in the dialog, since the new Dogg Food release we now have an Autosave feature, and we optionally save your old file.ui to a file.ui~ backup every time you save. There are also some configurations on what kind of warnings to show when saving your glade file (since it can be annoying if you already know there are deprecated widgets, or unrecognized widgets and have the dialog popup every time you save).

Glade Project Properties

Project Properties Dialog Before

Project Properties Dialog After

 

 

 

 

 

 

 

Refactoring out the project properties dialog into a separate source file, and implementing the UI with Glade makes the GladeProject code more readable, also the UI benefits again, notice the not so clear “Execute” button has been moved to be a secondary dialog button (with a tooltip explaining what it does).

Also the new project attributes have been added to allow one to set the project’s translation domain or Composite Template toplevel widget.

Now that’s just the beginning, let’s walk through the new custom editors.

Button Editor

GtkButton Editor After

GtkButton Editor After

GtkButton Editor Before

GtkButton Editor Before

 

 

 

 

 

 

 

 

 

Here’s where the fun starts, while we did have some custom editors before, they all had to be hand written, now I’ve added some base classes making it much easier to design the customized property editors with Glade.

First thing to notice is we have these check button property editors for some boolean properties which we can place wherever in the customized property editor layout (checkbuttons previously didnt make any sense in a layout where one always expects to see the title label on the left, and the property control on the right, in a table or grid layout).

Entry Editor Before

GtkEntry Editor Before (top portion)

GtkEntry Editor Before (top portion)

GtkEntry Editor Before (bottom portion)

GtkEntry Editor Before (bottom portion)

 

 

 

 

 

 

 

 

 

 

 

 

Entry Editor After

GtkEntry Editor After (top portion)

GtkEntry Editor After (top portion)

GtkEntry Editor After (bottom portion)

GtkEntry Editor After (bottom portion)

 

 

 

 

 

 

 

 

 

 

 

 

All around better layout I think, also we save space by playing tricks with the tooltip-text / tooltip-markup properties for the icons. While in reality GTK+ has separate properties, we just add a “Use Markup” check to the tooltip editor and use that to decide whether to edit the normal tooltip text property, or the tooltip markup property.

Image Editor

GtkImage Editor Before

GtkImage Editor Before

GtkImage Editor After

GtkImage Editor After

 

 

 

 

 

 

 

 

 

Here we economize on space a bit by putting the GtkMisc alignment and padding details down at the bottom, also we group the “use-fallback” property with the icon name setting, since the fallback property can only apply to images that are set by icon name.

Label Editor

GtkLabel Editor Before

GtkLabel Editor Before

GtkLabel Editor After

GtkLabel Editor After

 

 

 

 

 

 

 

 

 

 

 

 

 

Like the GtkImage Editor, we’ve grouped the GtkMisc properties together near the bottom. We also have generally better grouping all around of properties, hopefully this will help the user find what they are looking for more quickly. Another interesting thing is that the mnemonic widget editor is insensitive if “use-underline” is FALSE, when “use-underline” becomes selected, the mnemonic widget property can be set directly to the right of the “use-underline” property.

Widget Editor / Common Tab

Last but not least (of what we’ve done so far) is a completely new custom editor for the “Common” tab (perhaps we can do away with the “Common” tab altogether… use expanders where we currently have bold heading labels, now that we do it all with GtkBuilder script, sky is the limit really)

GtkWidget Editor Before (top portion)

GtkWidget Editor Before (top portion)

GtkWidget Editor Before (bottom portion)

GtkWidget Editor Before (bottom portion)

 

 

 

 

 

 

 

 

 

 

Widget Editor After

GtkWidget Editor After

GtkWidget Editor After

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Here again we play some tricks with the tooltip, so that we don’t have two separate property entries for “tooltip-text” and “tooltip-markup” but instead a simple switch deciding whether to set the tooltip as markup or not. The little “Custom” check button in there makes the tooltip editors insensitive and instead sets the “has-tooltip” property to TRUE, so that you can hook into the “query-tooltip” signal to create a custom tooltip window.

Now while these are just the first iterations of the new editor designs and layouts, the really great news is that we can now use Glade to design the layouts of Glade’s widget editors, so you can expect (and even request ;-) ) better designs in the future. Also, we’re open to ideas, if you have any great ideas on how to make widget property editing more fun, more obvious, more usable… please share them with us in bugzilla or on our mailing list.

 

Extra amendment: Fitting images into blog post side by side has been a delicate exercise, it looks different in the editor, different at blogs.gnome.org, and again different on planet.gnome.org, just goes to show that I make for a terrible poster boy, not to mention I don’t post quite that often… anyway… hope the formatting of this post is endurable at least, it’s best viewed at blogs.gnome.org I think.

April 09, 2013
Tristan Van Berkom - April 09, 2013 - 11:05

Hello fellow hackers, today we bring you a new feature which I believe can very much improve the GTK+/GNOME developer story.

This is a feature I’ve been planning for a long time (I originally blogged about it 3 years ago) so I’m very excited about it having finally landed in GTK+, it’s my hope and ambition that this feature will help shape the future of user interface programming with GTK+.

Before I continue, I have to thank Juan Pablo Ugarte for keeping the dream alive and talking about this at the last GUADEC. Also recognition must be given to Openismus GmbH for sponsoring my full time work on this for the last few weeks, the time to completely focus on this task would not have been afforded me without them.

Unfortunately this post will be a little terse, the one I had planned which is a bit more relaxed and has some comic relief will not be ready on time. So, at the risk of being taken seriously, let’s continue with a brief overview of the APIs introduced to GTK+ and the actions taken.

What are Composite Widget Templates ?

Composite Widget Templates are an association of GtkWidget class data with GtkBuilder xml, which is to say that the xml which defines a composite widget is now a part of the definition of a widget class or type.

This feature automates the creation of composite widgets without the need for directly accessing the GtkBuilder APIs and comes with a few features that help to bind a GtkWidget with it’s GtkBuilder xml.

As of yesterday, 23 composite widget classes in GTK+, from simple classes such as GtkFontButton or GtkVolumeButton to more complex widget classes such as GtkFileChooserDefault and GtkPrintUnixDialog have all been ported to remove all manual user interface creation code, in favour of GtkBuilder xml.

So, how can I use it ?

There are three or four new APIs added to GtkWidget which play on the class data, currently they will only be available in C, but if you have a little imagination, you can see how this can be very useful in higher level languages, by extending the syntax and adding some keywords (hint: I have vala in mind as a top candidate).

Before I go into the API details here, I would like to point out a complete working example which I created today while writing this post. To give it a try, you need of course GTK+ master from today (or yesterday). For those who are interested I suggest you download that small tarball, and build it with one simple ‘make’ command.

First, lets start with an example of how to bind your template to your widget class:

static void
my_widget_class_init (MyWidgetClass *klass)
{
  GtkWidgetClass *widget_class = GTK_WIDGET_CLASS (klass);

  /* Setup the template GtkBuilder xml for this class
   */
  gtk_widget_class_set_template_from_resource (widget_class, "/org/foo/my/mywidget.ui");
}

static void
my_widget_init (MyWidget *widget)
{
  /* Initialize the template for this instance */
  gtk_widget_init_template (GTK_WIDGET (widget));
}

So, to bind some GtkBuilder XML to a widget class, we need to call two functions:

  • gtk_widget_class_set_template_from_resource() binds some GtkBuilder XML to the class data
  • gtk_widget_init_template() initializes the template for a given instance, this is currently needed for the base C apis, but both could certainly be automated in a highlevel language.

Next, we have a function which creates an implicit relationship between some instance variables and some objects defined in the GtkBuilder XML:

struct _MyWidgetPrivate
{
  /* This is the entry defined in the GtkBuilder xml */
  GtkWidget *entry;
};

static void
my_widget_class_init (MyWidgetClass *klass)
{
  GtkWidgetClass *widget_class = GTK_WIDGET_CLASS (klass);
  GObjectClass *gobject_class = G_OBJECT_CLASS (klass);

  /* After having called gtk_widget_class_set_template_from_resource(), we can
   * define the relationship of the private entry and the entry defined in the xml.
   */
  gtk_widget_class_bind_child (widget_class, MyWidgetPrivate, entry);

  g_type_class_add_private (gobject_class, sizeof (MyWidgetPrivate));
}

In the above code, we’ve defined a relationship between the MyWidgetPrivate pointer named ‘entry’ and the object in the GtkBuilder XML of the same name ‘entry’. The entry will be available for access on the subclassed GtkWidget instance private data at any time after gtk_widget_init_template() was called, until the given widget is disposed (at which time the pointer will become NULL). GTK+ takes care of memory managing such automated pointers, so it is ensured to exist for the lifetime of your instances.

Again, with highlevel bindings in mind, this could be implemented as some syntactic sugar in the actual declaration of the instance variable.

Finally there is one more point of interest in the API which is Callbacks. Functions in your widget class code can be specified as Callbacks which serve as endpoints for any signal connections defined in the GtkBuilder XML.

/* A callback handling a "clicked" event from a button defined in the GtkBuilder XML */
static void
my_widget_button_clicked (MyWidget  *widget,
                          GtkButton *button)
{
  g_print ("The button was clicked with entry text: %s\n",
           gtk_entry_get_text (GTK_ENTRY (widget->priv->entry)));
}

static void
my_widget_class_init (MyWidgetClass *klass)
{
  GtkWidgetClass *widget_class = GTK_WIDGET_CLASS (klass);

  /* After having called gtk_widget_class_set_template_from_resource(), we can
   * declare callback ports that this widget class exposes, to bind with <signal>
   * connections defined in the GtkBuilder xml
   */
  gtk_widget_class_bind_callback (widget_class, my_widget_button_clicked);
}

Note that all signal connections defined in composite templates have the composite widget instance as user data by default.

In the above example code, my_widget_button_clicked() callback was declared with the assumption that the <signal> connection defined in the template was declared as ‘swapped’. Swapped signal connections are connections where the user data of the callback is returned first instead of the emitter. I think that this should be the default for composite widget callbacks as it blends in more naturally with normal class methods (where the class instance is always the first parameter).

This detail might not apply directly to higher level languages, which could achieve the above by adding some syntactic sugar in the declaration of a Callback method. Perhaps the instance will be implied as the ‘self’ variable.

I have also included some additional API to allow bindings to specify the GtkBuilderConnectFunc which should be used to make signal connections for a given widget class. I hope that bindings authors will contact me if they need any additional support in the GTK+ api to implement this.

Can I now use Glade to define my Composite Widget Templates ?

Of course silly ! That’s the whole point right ?

You’ll need Glade master from today as well, however I should be rolling a development snapshot with full support for this later this week as well.

All of GTK+’s composite widget classes have been recreated using Glade. Here is a screenshot of a GTK+ composite widget being edited in Glade:

Glade editing the GtkFileChooserDefault

Glade editing the GtkFileChooserDefault

Glade is still in it’s early stages supporting this, so there is hardly any features added here. I hope to work towards a brighter future where Glade can understand a multitude of composite widget templates as components of a single project, which will open the doors for some really nice and useful features.

Conclusion

All in all I have a lot to say about this work, but I’ll cut this blog post short for now, however I may be posting followups in the near future.

I’m very satisfied with this work, and I hope you will enjoy creating user interfaces as composite widget classes.

April 02, 2013
Murray Cumming - April 02, 2013 - 11:51

A few days ago I pushed 44 more commits to the Maliit Plugins repository now that Canonical have published (in the Ubuntu Phablet project’s maliit-plugins Launchpad/Bazaar repository) that work that we (Openismus) did for them.

This brings the Maliit Keyboard into the QML/QtQuick2 world for Qt5, removing the use of QGraphicsView which is not really suitable for Qt5. This should also have some performance advantages and makes customization even easier.

Michael Hasselmann blogged a summary of the state of Maliit today. The recent work, along with the Wayland integration, has made Maliit more popular than ever.  But we still need to line up customers to fund the ongoing development, generally while creating custom features or solutions for them.

Michael Hasselmann - April 02, 2013 - 11:22

We’ve been busy! The result is our latest release, Maliit 0.99, with a massive amount of changes. While the framework has mostly seen code cleanups (:~/source/maliit/fw$ git diff 0.94.0..0.99.0 --stat: 389 files changed, 3612 insertions(+), 30081 deletions(-)), we’ve been adding tons of features to the reference plugins (:~/source/maliit/plugins$ git diff 0.94.0..0.99.0 --stat: 324 files changed, 32499 insertions(+), 12799 deletions(-)).

After (finally!) getting official Debian and Ubuntu packages, the next big step is our Maliit 1.0 release, which we expect sometime later this year.

Maliit 0.99 works with applications using GTK+2/3 or Qt4/5, though Qt4 still works best. Maliit’s framework and reference plugins strictly require Qt5 as build dependency, allowing us to get rid of Qt4 and Xlib legacy code. Maliit can run on Wayland but at this point only EFL applications support Wayland input methods. For Qt5 applications wanting to use Maliit together with Wayland, a Qt5 platform plugin for Wayland input methods would be needed.

Who is using Maliit?

The best Maliit showcase is still the Nokia N9. No other device got anywhere near to the demonstrated input method integration level. The haptic feedback of its virtual keyboard remains unchallenged. I think it’s a major success for our old Harmattan text input method team that we see Maliit being used in other projects. It was hard work to get to this point and yet it feels as if this was only the beginning of Maliit as an open-source project.

We have the friendly folks beind the OLPC project who were one of the first to go with Maliit. Few of us will probably make the new XO their primary device and therefore, won’t see the results of the collaboration. But from the very beginning, they worked with us in the open, provided official Fedora packages and helped us test-driving the then-new styling engine of Maliit Keyboard. The XO is one of the use-cases where Maliit has to run without a compositing window manager, which made them an early adopter of our windowed mode.

Plasma Active is another project that — for its size — invested significant development time improving and testing Maliit’s reference plugins. They’ve been (and hopefully remain!) a huge help whenever we faced problems with QML. I wish we could find a sponsor however to work on improving their input method integration, which is the “invisible” and challenging part that easily gets forgotten.

Ubuntu Touch is choosing Maliit as well. Canonical sponsored significant work in Maliit Keyboard, especially in terms of Qt5 support and replacing QGraphicsView with QtQuick2.

SailfishOS is happy user of Maliit, too. They are the main consumers of our 0.80 branches that still offer full support for Qt4 and X11. We always considered 0.80 our latest “stable” release branch, so it’s good to know that at least someone is also using it.

OpenEmbedded has integrated Maliit into their meta layer. This makes it easier to build custom Linux distros for embedded with Maliit on board. My understanding of OpenEmbedded is still limited at this point so I cannot explain all the real benefits this collaboration provides. Luckily there’s the always friendly Samuel Stritzel who’s following our official mailing list, waiting to answer all your OpenEmbedded related questions.

Gnome3. Just kidding. But once Gnome3 properly integrates with Wayland input methods, we can just plug in Maliit, too. Of course to be of any interest for Gnome3, we need to demonstrate working CJK input methods too. Luckily, we got the opportunity to just do that: IBus working with Wayland input methods. If things pan out I’ll be talking about this at GNOME.Asia.

I would like to get the chance to improve input method support for Firefox and Chrome/Chromium. There is one nasty bug around password fields that’s been quite a blocker for us. Some day, we’ll hopefully be able to convince the browser guys to just “do the right thing”.

And then there’s yours project of course. You just didn’t tell me about it yet ;–)

Architecture

There is an influx of new developers showing interest in Maliit which is why I feel that reiterating the basic architecture behind Maliit doesn’t hurt.

Maliit is split into three parts: framework, reference plugins and input method modules (GTK+, Qt4, Qt5).

The framework provides maliit-server, a process that can load reference plugins (such as VKBs) and allows applications to connect to it through input method modules. The input method modules are UI toolkit specific and are loaded dynamically by the applications. For each UI toolkit Maliit wishes to support, a separate input method module is required.

Input method modules and framework are designed as client-server architecture, with applications behaving as clients. The default IPC is QDBus but it can be replaced by other mechanisms, for instance the Wayland IPC.

In a follow-up blog post I will provide an outlook of where Maliit goes next and which parts of it we might have to rethink, now that the leviathan of Linux DEs is quickly moving towards Wayland.

People

Those are the people who have contributed to Maliit, but you wouldn’t know if you only checked commit messages:

  • Łukasz Zemczak, Thomas Möricke and Kevin Wright: You guys have been great. Would work together again.
  • Iain Lane, Michał Zajac, Peter Robinson: Without you, we still wouldn’t have proper Debian/(K)Ubuntu/Fedora packaging. Thanks!
  • Samuel Stritzel: Thanks for bringing Maliit to OpenEmbedded! Hopefully we’ll see more use of your work soon.
  • Simon Schampijer, Daniel Drake, Gary Martin, Jennifer Amaya and Martin Langhoff: Thanks for the opportunity to work on OLPC. I know most of us are not in the target audience which makes it hard for you guys to motivate others to help you. Your work is important, please never give up.
  • Pekka Vuorela: I am just glad to have you back in the team. Maliit just isn’t the same without you.
  • Aaron Seigo, Marco Martin: Thanks for all your enthusiasm about Maliit and Plasma Active. Your motivation is infectious.
  • Jan Arne Petersen, Krzesimir Nowak: I am impressed by your motivation and dedication that you still have for input methods, even though none of us ever planned to become an “input method expert”.

Permalink | Leave a comment  »

Jens Georg - April 02, 2013 - 09:51

For the second time (at least that we know of) software based on GUPnP has been successfully certified by the UPnP Forum. Cloud-dLeyna, providing UPnP-AV/DLNA client APIs to HTML5, has achieved UPnP certification recently: https://lists.01.org/pipermail/dleyna/2013-March/000115.html

March 13, 2013
Mathias Hasselmann - March 13, 2013 - 09:54

With Tristan just landing his work on direct-read-access for EDS 3.8 we at Openismus started to look for use-cases of DRA outside the customer work we did. A first project that came into mind was LibreOffice: It provides a connectivity driver for Evolution's address books. Basically this component is to support standard letters. Now mass printing letters isn't exactly dominated by the address book's reading performance, but to the user the addressbook appears like any other database, so it surely can used for more interesting tasks. Also the EDS function LibreOffice uses for accessing addressbooks got deprecated with EDS 3.8, so we gave it a try.

Downloading and building LibreOffice was a simple walk thanks to LibreOffice's great build instructions. Somewhat interested in build systems I was positively surprised when seeing LibreOffice's sophisticated build system that's entirely based on Autoconf and pure GNU make. The relevant code was easy to find and understand, so I patched it quickly. Getting environment variables and D-Bus configuration right for testing was the biggest effort. After being sufficiently confident about the code, I did a quick push to a new gerrit topic branch. Shortly later the code was reviewed and merged. Nice experience.

So what do you get from this patch: Accessing Evolution address books from LibreOffice should be more efficient now, as we eliminated a few layers of complexity for reading. This also should reduce worries about GLib main loops or threads interacting badly with LibreOffice, as direct-read-access skips D-Bus and just runs the backend drivers.

Still the database wizard doesn't list all available Evolution address books: There are local, LDAP and Groupwise address books, but Google address books are missing for instance. Additionally only the first address book of each backend is available. Looking at the database wizard now to fix this, and getting a first impression of the older code's complexity.

March 04, 2013
Murray Cumming - March 04, 2013 - 10:45

On Friday I pushed 194 commits to the Maliit Plugins repository, adding new features, bugfixes, cleanups and tests to the Maliit Keyboard. This is some of the work Openismus has done for Canonical over the last few months which we are now allowed to upstream. This includes hard work by Michael Hasselmann, Krzesimir Nowak, Jan Arne Petersen and Jon Nordby.

Our work on the underlying Maliit Framework for Canonical was published upstream as we did it. We believe we’ll be able to upstream more of our Maliit Plugins work in the future.

Versions of these Maliit Plugins commits were published a few days ago in the Ubuntu Phablet project’s maliit-plugins Launchpad/Bazaar repository. It also contains commits (not by us) on maliit-plugins’ Nemo Keyboard, mostly for integration with the Ubuntu Touch platform (and its use of Android’s Surface Flinger). The recent Ubuntu Touch preview is using a version of that Nemo Keyboard, though we believe that’s meant as a temporary solution. A properly integrated Maliit Keyboard should behave significantly better.

Anyway, these commits add these features to the Maliit Keyboard:

  • Auto-capitalization.
  • Styling, such as a black underline for the current word and a red underline for a word with an error, though its up to the toolkit exactly how it shows this.
  • Word prediction, error correction, etc are now available when editing previously-entered words, instead of just the next word, taking into account the surrounding words.
  • Users can add words to the dictionary with a long press on the space key.
  • More settings to enable/disable auto-capitalization, auto-correction, word prediction, error correction, audio feedback and whether the word ribbon should be disabled in portrait mode.
  • Applications can specify text and icons for actions keys, such as Done, Go, Login, etc.
  • Keyboard themes can now specify fonts.

The maliit-plugins NEWS file gives more details.

Many of these features were already in the old MeeGo Keyboard (used by the Nokia N9) which had to be dropped last year because of its libmeegotouch dependency and its need for proprietary plugins to achieve these features.

We hope to have all this in an official Maliit release soon.

March 01, 2013
Murray Cumming - March 01, 2013 - 10:31

I have really enjoyed Isabella Wilkerson’s The Warmth of Other Suns, which I “read” via Robin Miles’ excellent Audible narration. It’s about the Great Migration of black people in the USA from the South to cities in the North and West, from the 20s to the 70s, told mostly via three personal stories in parallel.

This is a huge part of American history that gets very little attention in popular culture, despite the wealth of supporting material due to it being such recent history. It’s full of incredible stories of personal courage and adventure. People escaped awful injustices that should not be forgotten. They were often prevented from leaving rural towns in the South, where they were given no choice but to work hard for little pay, at regular risk of violent assault and death. For many, escaping seems to have been almost as hard as for people escaping the Eastern block during the Soviet era. But, unlike defectors, their escapes were not celebrated in the US.

What’s really forgotten is how hard it was for people to settle once they escaped. They had more opportunities but these were still limited and blacks were initially prevented from living in some neighborhoods or taking many jobs. This is yet another dramatic chapter to peoples’ stories.

New distinctive communities were founded, and they should be celebrated by telling the stories of the people who built them. When I was in New York City over the summer, I found some time to visit Harlem at short notice with my young son. We walked around to get a feel for the place, and tried to join a Harlem Heritage Tour, but none were happening that day. It’s a small organization that I’d love to try again, but I cannot understand why no business has funded a massive tourist destination in Harlem that could be one of the big attractions on the New York CIty Pass along with the Empire State Building, Ellis Island, the Museum of Natural History, etc.

 

 

February 21, 2013
Murray Cumming - February 21, 2013 - 11:41

I found a little time to do the first glibmm release in a few months. Kjell Allstedt and José Alburquerque have pushed so many commits, fixing several awkward bugs, that I had to get it out there. It includes almost no commits from me.

Now to try rolling a gtkmm tarball.

February 15, 2013
Murray Cumming - February 15, 2013 - 14:40

Of and on over the last couple of weeks, I have added MySQL support to OnlineGlom, like I recently added to Glom itself. Now it works and is in git master. When I have the time I’ll try it out with Google’s Cloud SQL (MySQL) in Google’s App Engine.

As with regular Glom, most of the work was getting the self-hosting tests to work with MySQL – to start the MySQL instances, create the databases, fill them with data, test them and shut them down. The rest of the support was mostly covered already by JOOQ but I had to make sure it always knew what SQL dialect to use.

February 13, 2013
Murray Cumming - February 13, 2013 - 17:49

Over the last few months, I have worked on Rygel‘s documentation, along with Krzesimir Nowak and Jens Georg here at Openismus. Most of that work is now finished. It’s been a great investment of time that should be of real benefit to the project.

We’ve massively improved Rygel’s (C) API documentation, which was rather bare after Rygel’s initial split into shared libraries. We had to investigate how the current plugins use the API, and sometimes improved the API in response. (The very latest API documentation improvements will be online soon, when we do a new Rygel release.)

We’ve added both simple and real-world examples, linking to them from sections in the API documentation and describing how those examples work. Those real-world examples are standalone GStreamer-0.10-based versions of the regular Rygel media engine and of its media-export server plugins, plus a GStreamer-0.10 version of the standalone renderer example.The original code for these (now using GStreamer-1.0) was in Vala, like the rest of Rygel, so we had to convert them to C. To maintain functionality, we chose to clean up the horribly-obfuscated C code generated by Vala. That took us a few frustrating weeks to finish but we got it done.

The new Rygel Integration page provides an overview of the APIs that platforms should find interesting, linking to the various documents that we’ve created during this effort. That Integration page is part of a complete overhaul of Rygel’s wiki project pages to make them more attractive and useful.

To help with maintenance of Rygel itself, we now have a Rygel Architecture page with descriptions of Rygel’s program flow in various situations, and a Rygel architecture diagram showing how the various parts of Rygel work together.

 

 

February 12, 2013
Tristan Van Berkom - February 12, 2013 - 13:47

Here, is yet another follow up post on EDS memory consumption. For the last few days I’ve been tracking where memory is spent in EDS and our benchmarking tools, and it was a very interesting experience.

And I’m not just saying that ! it was very trying and it’s still a bit of an unsolved mystery to me (so please feel free to step in with your theories on the unsolved parts !).

It all started when Michael asked me to explain the funny spikes in the memory usage graph presented in the previous post. The first thing I did was to produce a more “bumpy” graph by disabling the slice allocator, yielding what is in some ways a more accurate account of actual memory usage:

Memory usage measured for 12,800 contacts with G_SLICE=always-malloc

Memory usage measured for 12,800 contacts with G_SLICE=always-malloc

Interestingly, I say “in some ways” above because; one of the elements that we have to consider is memory fragmentation; memory management is generally more optimal and less fragmented when the slice allocator is active.

What we are looking at above is a left to right graph of overall memory usage; measured after each and every operation that we run on the addressbook. Each “dot” can be associated to one of the various latency tests that we run for each and every build of EDS (indicated in the legend).

First of all let’s demystify the “curious humps” which occur mostly to the “Custom Light” (light blue) benchmarks but are also noticeable in other benchmarks. These “humps” occur for four dots at a time, particularly when performing suffix searches on contact fields that are not stored in the summary SQLite tables for quick searches.

This phenomenon is partly attributable to the fact that all contacts in the addressbook need to be individually examined (and the vcards individually parsed) when the given contact field is not stored in the SQLite tables individually (or what we refer to in EDS terms as “the summary”). I’m not really very concerned by these “spikes”; obviously the memory is reclaimed later on, however it is curious that this happens specifically for suffix matching and not for prefix matching (presumably lot’s of extra string duplications and normalizations are needed for the case insensitive suffix matching routines).

Now that that’s out of the way, it leads us to some of the…

More interesting parts

I was at first not satisfied with only this explanation, sure, it kindof explains the “funny humps” in the benchmark progress but… by taking a closer look at what else is actually happening… I needed a better explanation.

The portions of the presented memory usage graphs that interest me more are the memory growth observable over the course of the first four dots, as well as the curious memory growth that also occurs at the very end of the benchmarks.

So what is happening in these stages ?

First of all, it’s positive news to know that the number of automatically generated vcards used for testing are already in memory before the benchmarks start at all, in the above graph that represents 12,800 vcards all in memory before the first benchmark is measured. And then…

  1. The addressbook is initialized and created, so at the point of measuring the very first dot, we have 12,800 vcards in memory and an initialized EBookClient on the client side and an addressbook counterpart (SQLite database created and active SQLite connection) in the server side memory
  2. Next, at the second dot we’ve created 12,800 EContact objects in memory… the 12,800 EContacts and 12,800 vcard strings remain in memory throughout the benchmark progress. This second dot is about 45MB higher on the scale than the first dot, so it’s pretty safe to say that 12,800 EContact objects cost roughly 45MB of resident memory which will not be reclaimed for the duration of the benchmark progress.
  3. The third dot is measured directly after adding all the contacts to the addressbook, here we start to see some divergence in memory usage; notice that this costs roughly 25MB extra for EBookClient based benchmarks, but only about 5MB for EBook based benchmarks. Being a bit naive, I overlooked this detail at the beginning of the investigation… one of the notable differences in the EBook apis is that it was lacking in batch commands. So the major difference here is that EBook tests add contacts one by one over D-Bus, while the EBookClient tests add contacts in batches of 3200 contacts at a time.
  4. This fourth dot, is after fetching all contacts at once from the addressbook. Here is where I became seriously alarmed. For normal clients, this shows an approximate 30MB growth in memory consumption. So where did my memory go ? A simple case of amnesia ?! Note though, that the Direct Read Access (red) benchmark hardly increases in memory for a fetch of 12,800 contacts, good show.

Naturally, feeling embarrassed about the consequences of the evil fourth dot… I frantically started my search for memory leaks… first I blamed the obscure nature of C++ code and it’s attempts to hide memory management behind smart pointers…I tried to pin it as a memory leak in the actual benchmarking code (after all, I did just lose 30MB of memory… it must have gone somewhere… right ?)… but after some tracing around, I found that those returned contacts, stored by smart pointers or not, were properly finalized and freed, leaving me with this uncomfortable mystery still on my hands.

While most of my memory leak hunt revolved around explaining the 30MB memory overhead incurred from dot 3 to dot 4, I should mention that the last memory jump was also suspicious. This last memory jump (which seems to vary between a 10MB to 25MB increase depending on the benchmark type) is incurred by deleting all contacts in the addressbook. So how about that ? I’ve just deleted all the contacts, and now I’m using MORE memory than before ?

The following day…

… I ran the benchmarks in loops, for some I’ll share below because this is how I eventually solved the mystery case, I also ran the benchmarks (server and client) under valgrind, ran some various test cases with the server and test cases running under valgrind. But the alleged memory leak was not to be tracked. Some testing of the benchmarks running in a loop seemed to indicate that there was some memory growth over time, not very much so, but enough to make me believe there must be some leak and be determined to find out.

Finally, today…

… I let my laptop chug along and loop the benchmarks (at least some of them) with a huge 12,800 contact count (that takes time), so let’s share those enlightening results here:

Memory usage while benchmarking 12,800 contacts in the first iteration

Memory usage benchmarking 12,800 contacts in the second iteration

Memory usage benchmarking 12,800 contacts in the second iteration

Memory usage benchmarking 12,800 contacts in the third iteration

Memory usage benchmarking 12,800 contacts in the third iteration

These results would be better viewed from left to right instead of one on top of the other, but you get the idea. Just consider that the last dot in the first chart happens directly before the first dot in the following chart, and so on.

So, after viewing this data… we can see that in the second and third graph, memory we presumed to be lost, is eventually returned to the system (in other words, it was indeed only a case of temporary amnesia, and not a more severe degrading case of alzheimer’s)… This is very reassuring, numerous runs with valgrind also show no real evidence of memory leakage, which is also reassuring evidence that our EDS is leak free.

But, that still doesn’t really explain…

Where is that memory actually going ?

At this point I can only give you my best guess, but all of the clues seem to point towards D-Bus traffic:

  • At the second “dot” where contacts are added to the addressbook, EBook APIs adding only a single contact at a time seems to cost much much less than using EBookClient apis and adding the contacts in batches of 3200 contacts at a time.
  • At the third “dot” where a brute “fetch all contacts” call is made to the addressbook, we can see a huge increase in memory consumption all except for when using Direct Read Access mode. So when fetching a list of 12,800 contacts not using D-Bus, we don’t suffer from memory loss.
  • In the last suspicious “dot”, where we delete all contacts from the addresssbook at once, all benchmark types seem to suffer significant memory loss. In this case the client is sending a list of 12,800 contact UIDs over D-Bus to the addressbook (in Direct Read Access as well, since deleting contacts is a write operation).

My best guess ? this is all due to zero-copy IPC transfers implemented by D-Bus.

In other words (if you’ve read up to this point you probably don’t need any explanation), instead of the sender writing chunks of data to a socket, and the receiver reading bytes from a socket; the sender is owning some shared memory which is accessed directly by the receiver.

This shared memory is probably managed by the D-Bus daemon itself, so it would make sense that the daemon not release the shared memory straight away but instead reserve some head room in the case that further transfers might reuse that memory.

So how come the fourth dot where a batch of 12,800 vcards are passed to the client, is not reused by the last dot where all contacts are deleted ? … Because, when contacts are fetched the shared memory owner would have to be the sender, which is the addressbook server. However when contacts are deleted, it is the EBookClient user process which sends a list of 12,800 UIDs, in this case the owner of the shared memory should be the other, client process.

I’ll probably need to pursue some extra verifications to be sure, but this best guess is very compelling to me at this time.

In conclusion, this was a really interesting exercise, which I don’t hope to repeat very often… but I did learn a few things and it did put some things into perspective. First and foremost; measuring memory usage, when compared to just tracking and plugging leaks, is quite another story… a lot more tricky and probably not an exact science.

If you’ve got this far, I hope you’ve enjoyed this detective story… I did enjoy it.

Amendments

It’s probably bad form but I’ll just add this here, my theory is obviously false. As I’ve been informed (already) that D-Bus does not implement any such zero-copy mechanisms with shared memory… so there is still a huge memory fluxuation, definitely related to D-Bus usage, which I can’t readily explain.

 

February 07, 2013
Murray Cumming - February 07, 2013 - 13:21

At Liam’s new Kindergarten, I’m responsible for the computer stuff. So I thought I’d make things easier by using Google Apps, which is free for educational institutions.

My aims were to have:

  • Official email accounts, without the bother of managing our own email server. (Google Mail)
  • Mailing lists (or groups) so we can send mail to one parents@ email address instead of copy/pasting long CC lists.
  • An online calendar that we could use to show official events, which could show up on peoples’ iPhones and Android phones. (Google Calendar)
  • A shared space for official documents. (Google Drive)
  • Private photo galleries. They can’t be public because people don’t like pictures of their children being online. (Google’s PicasaWeb, I hoped)

I was really looking forward to playing with this stuff, hoping that it would quickly do useful things, and hoping that it would be easier for non-technical people to administer when I’m not around. But I’ve been rather disappointed.

The various problems lead to me having to choose between two options, both of them inadequate:

Option 1: Domain users

In this scenario, I would add someone@ourdomain.com accounts for all of the parents.

Pros:

  1. We would be able to see all contact details for parents and teachers in the “Directory”. That would be available on peoples’ Android phones, too, and maybe on iPhones.
  2. We could use PicasaWeb to share photos and view them as galleries.
  3. System administration would generally be easier. The simplicity makes
    it easier to avoid sharing something publicly by accident.

Cons:

  1. We would give everyone an @ourdomain.com Google account. That would give them a @ourdomain.com email address, using GMail, which would be useless to them. Parents would each have to manually set up GMail to forward to their personal email address, though I could tell them how to do that.
    (You could send email to some.parent@ourdomain.com, but when they replied, you would see them replying from their some.parent@hotmailorsomething.com email address.)
  2. If parents already have a Google account, they would need to switch between them in the web browser. That’s fairly easy using Google’s multiple sign-in, but that does not work well right now. Sometimes you’ll find yourself using the other account suddenly.
    It’s a particular problem for Google Apps that don’t support multiple-sign in, but real people do use those apps. For instance:
  • If you click on a link to a PicasaWeb album, you’ll just be told that the album doesn’t exist, until you log out, log in as the other user, and try again. That is beyond the abilities of the typical user.
  • Google Checkout, used when buying from Google Play, will offer a choice of users to log in as, but I’ve regularly seen it ignore your choice and continue as the wrong user.
  • When going to the GMail site, if you are logged in as a user with no gmail email address, it will offer to create one for you with a “Add Gmail to your Google Account” page. If you try to switch users to the user with a GMail account, it will just take you back to the same page for the previous user.

Option 2: Non-domain group members

In this scenario, I would not add someone@ourdomain.com accounts for all of the parents.

Instead I would add their regular email addresses as members of groups such as parents@ourdomain.com. The various Google Apps seem to allow sharing to group email addresses, understanding that that means sharing to the group’s members, though that’s not documented or hinted at.

The parents would have to create Google accounts for these non-Google email addresses, which is still possible via this link to create a Google account without being forced to have a Google email account, though that link might not work in future. (Update from 2013/04: Indeed, it doesn’t work now, so you have to disable the gmail email address after creating the Google account.)

Pros:

  • People could use their existing Google accounts and existing email addresses.

Cons:

  • We could not use Google’s contacts system to share a list of contacts. We’d have to maintain a separate list on a web page. It wouldn’t be available in the normal way on an Android or iPhone phone.
    This just isn’t possible with Google Apps unless all the people have domain accounts, with domain email addresses. Google recommend the use of 3rd-party web applications that let you manually sync address books every now and then, but that’s not good enough.
  • We could not use PicasaWeb to share photos. We could use Google Drive
    instead, but this doesn’t offer a gallery view showing sets of photos
    with back/forward, etc.
    This is because PicasaWeb does not support sharing to a Google Apps group. You can only share to users in your domain, or you can share via an obfuscated, but public, URL.

Summary

For now, I find that Google’s multiple sign in is so awful that I cannot risk asking people to deal with it and cannot risk having to support them with it. Therefore, I have to live with sharing photos via Google Drive instead of PicasaWeb, and having no shared contacts list.

I feel that we wouldn’t have this problem if Google Apps really supported domains rather than just using redirects. For instance, I’d like to use email at mail.ourdomain.com, instead of typing that into my browser’s address bar only to be taken to mail.google.com/mail/u/0/?shva=1#inbox . A real domain would have its own users and there would be no conflict with regular Google users. Obviously that would be harder for Google to do.

Things would also be simpler if PicasaWeb supported multiple-sign in, and if it supported sharing to groups. Or if Google Drive allowed image files to be viewed via PicasaWeb. It’s to be expected that educational institutions will want to share photos privately. I wanted to use that as an attractive way to get people into the system.

It should also be expected that educational institutions, like many organizations, want to deal with people who will never have an official email address for that organization.

 

February 06, 2013
Michael Hasselmann - February 06, 2013 - 08:05

I am back from FOSDEM and I am always surprised how quickly it’s all over. Would you believe that I only got to attend two talks again, one of them being my own about Wayland input methods?

Friday Feb 1st

I arrived Friday evening, this time by train. Since I bought my first BahnCard ever in August last year, I am slowly convinced now that flying is just a huge waste of time and should be the last possibility to consider, not the first. Even though I spend much more time on the train than I would on the airplane I still travel more relaxed. Others claim to be super-productive on trains. I am happy enough if I get to braindump my thoughts into my notebook (the dead-tree variant) and then read a book (more dead-tree) to forget about work for once.

Another trick for better conferencing is to arrive (or leave) 1-2 days earlier (later), as this helps catching up with sleep and allows to process thoughts or just do some random stuff not related to conferencing, say, going to a museum or gallery. It allows me to be reckless to myself during the conference without needing one week to recover.

But back to that Friday evening: I was very happy that Quim could make it to our dinner at the Kabash. I knew he had a busy schedule for the next days and that I probably wouldn’t be able to meet him in Berlin. I think he enjoys working for Wikimedia even though he now has to pay for phone calls and 3G. We also had people from Igalia, Digia and Intel OTC joining. We enjoyed great food and nice conversations, then headed off to the beer event. This one has become crazily crowded. It’s nigh impossible to find people you know. Meeting new people is also a challenge. The entire concept of what should be a simple meet&greet is lost these days.

Saturday Feb 2nd

Jens and I arrived around 10am at the campus on Saturday, probably a first for both of us to be so early. It was so empty you could think they’d cancelled FOSDEM but that quickly changed of course. I had zero slides done by then and no motivation to do them either. I went to Dave’s talk and it was nice to see him and Mardy working on a relevant project again. As expected, GOA is dead in the water by now and work will most likely continue with accounts-sso. I realized that I would have to recycle an older talk for some slides, something I loathe to do.

Somewhere in between I had to meet some OpenEmbedded guys to debug Maliit on their custom hardware, but we didn’t get much further than realizing that dlopen() didn’t work properly.

Funniest thing that happened that Saturday was probably rasterman, timj and thiago trashing-talking about the kernel when they could have told each other how much their toolkits suck!

Biggest reveal of the day, however, was that Rob Taylor toured with the Darkness in his crazier years.

Jens was unlucky with his hands-on DLNA talk, but I blame it on the room (Lameere). I remember that my talk there last year was crap, too. None of his prepared demos worked and people started to leave the room. I felt bad for him because I the was one who pushed him into giving the talk. Mark took over the second half and surprisingly, all of his dLeyna demos worked. He got applause (and rightfully so) for each of them.

I only had two hours left by then for my talk (slides). Meeting more people all the time didn’t exactly help with my preparations. I was close to call Luc (organizer of the Xorg DevRoom) and ask him to cancel my talk. I am glad I didn’t because it turned out that worrying about my talk for the whole day was the best way to get back into the topic. You have to understand that Wayland input methods is a yesteryears project for me, all the hard thoughtwork as been done already and I have been mostly working on new projects since August last year. I think the talk was well received by the audience (at least the room was packed). However, I cheated by simplifying and ignoring the actual complexity of input methods a bit.

It was already getting late and by the time we arrived in Brussels center it was about time to head to the Gnome party. Same place at last year, but by around 11pm we had occupied the whole building. I didn’t even know half the guys. Gut feeling tells me this is a side effect of the UX hackfest that happend before, but I would like to believe that the Gnome community is growing again. It was great to see Gnome legends such as Federico or Alp attending, I think both of whom I had last (and first) seen in 2008.

Sunday Feb 3rd

I thought I’d be able to attend a couple of talks on Sunday, but I got stuck talking to people interested in actually working open-source DLNA stacks. I got invited to join the Xorg dinner in the evening, with half of the people being BSD guys. Xorg? BSD? I was fearing for my sanity. Some other embarassment might better be left unmentioned in this blog but the food and wine at least was great! Afterwards we ended up at Delirium again (where else …). At 2am we had to move to the Absinthe bar right next to Delirum. Wiser men than me (such as krh and Rob Clark, who now works for Red Hat!) left before, however. And so we ended up with more beer and several shots of Absinthe until the wee morning hours. Jon made the mistake at that point to ask me about my honest opinion (I actually don’t know why he tagged along with the Xorg hackers for so long), he might regret that by now. Wrong time, wrong place. Happens.

Monday Feb 4th

I had to checkout at 11am and get my train at 2:30pm. How I managed to get up by 11:15am I still don’t know. But luckily I was able to meet with Kat, Dave, Martin & Tobias for some waffle & chocolate breakfast. This was a much better FOSDEM ending than last year.

Tobias chose the same ICE train as me. Poor him, he had no idea what was coming for him: He had to endure my satirical reality talk from Brussels to Cologne. Both of us had to change trains then, with him heading off to Hamburg and me returning to Berlin. I wish I could have made it back a couple hours earlier now because I missed Quim’s presentation in the Wikimedia office.

Exhausted but happy, I finally arrived at home on Monday night.

Permalink | Leave a comment  »

February 04, 2013
Tristan Van Berkom - February 04, 2013 - 12:20

Hi again.

This is a follow up on my recent post on features and improvements to the Evolution Data Server that we’ve been working on at Openismus. Note that the previous post explains what we’ve done in greater detail, some of this post might not make sense without reading the aforementioned post.

As I was asked to write a more complete report on how each of our patch sets effect memory consumption in EDS, I went ahead and ran some further comparisons. As usual, Mathias’ benchmarks saved the day (while the original benchmark suite only generates memory consumption comparisons for a single run of contacts, I was easily able to produce charts for each individual run and compare them separately).

Actually I had postponed this post since I was hoping to update our final patch set for Direct Read Access apis before reporting my findings. It seems however that currently EDS master is in a period of transition and so I’ll postpone the new patch submissions until some temporary regressions in EDS are fixed (the code which does work with EDS master is however available on the branch).

Memory Usage Report

In order to get a grasp of the impacts on memory consumption that each patch set incurs, I’ve added two additional benchmarks to our normal set of benchmarks.

No BDB

This is a custom build of EDS gnome-3-6 branch with the removal of the BDB usage in the local file backend.

At this point there is no extra table in the SQLite to handle multi-valued vCard attributes, it’s simply a comparison of storing the vCard data in the BDB vs SQLite only.

Custom Light

This is a special run of our regular openismus-work branch, but with only the “Full Name” configured and indexed in the summary.

So this benchmark is a light-weight summary with considerably less columns (and one less table) used in the SQLite.

I ran this variation in the suspicion that SQLite might require significantly more memory with the additional multi-value table created to handle multi-valued attributes such as E_CONTACT_TEL.

Benchmark Results

Note that the RSS and VMS memory snapshots are taken by way of observing the /proc/$pid/status file for both the EDS server process and client benchmark process directly after stopping the clock for each benchmark in the suite. So a given value in the charts presented below is based on the “VmRSS” value of the server process added to the “VmRSS” value of the client process.

First, let’s show the results, or at least some of them, to put our deductions into context:

50 Contacts

50 Contacts

… Skipping a few results here in the interest of avoiding clutter … lets jump directly to 400 contacts …

memory-usage-rss-00400

400 Contacts

800 Contacts

800 Contacts

1600 Contacts

1600 Contacts

3200 Contacts

3200 Contacts

6400 Contacts

6400 Contacts

12800 Contacts

12800 Contacts

And now, some of the conclusions I came to while observing the results

BDB Removal

When compared to the unmodified EDS 3.6 branch, we can observe that the BDB removal reduces memory consumption for most reasonably sized address books. Up until we run the benchmark for 3,200 contacts, memory consumption is less without BDB… with 3,200 contacts and higher, memory consumption is increased by removing the BDB.

Without an in depth understanding of SQLite internals, I think we can deduce that the SQLite starts to require more memory to handle databases with >= 3,200 rows

Custom Light

This benchmark basically disproved my suspicion.

While using exactly the same code-base as the “EDS Custom” and “EDS Custom DRA” benchmarks; Using more indexes and tables in the SQLite does not seem to incur much of a difference in terms of memory consumption.

While the output is certainly different, as specially with large addressbooks, I don’t see much of a noticeable pattern here.

EDS Custom

This benchmark is basically the openismus-work branch with fully customized indexes for better performance in telephone number lookups.

When comparing this one to the unmodified EDS 3.6 benchmark, we can observe that memory consumption is slightly less using the custom EDS code than stock EDS 3.6.

When comparing this to the removal of BDB, we can notice that, as specially for small addressbooks, the base memory requirement of the EDS Custom is significantly higher than with only the BDB removal.

This second point is easily explainable, since removal of BDB alone reduces the overall memory footprint of EDS. The custom EDS benchmarks, without actually leveraging the Direct Read Access mode still links against the EDataBook library. Essentially this replaces the memory footprint overhead incurred by linking to BDB with a different overhead incurred by linking directly with EDataBook.

EDS Custom DRA

This benchmark is particularly interesting.

For smaller addressbooks the Direct Read Access mode indeed costs more resident memory than any other benchmark. This can be attributed at least partly to the penalty of loading an EDataBook into memory on the client side. Consequently, loading the EDataBook also loads the backend module in the client process, meaning we also have a running EBookBackendFile in the client as well as client side linkage and usage of the SQLite library.

However, once we approach addressbooks with 1600 contacts and more, the overall resident memory consumption starts to even out. Direct Read Access mode actually costs significantly less than any other benchmark for addressbooks as large as 6400 contacts and more.

These results are a bit harder to explain. My theory is that since the EDS server process essentially goes to sleep after adding the initial contacts. All queries thereafter require no interaction with the EDS server process.

Some things to consider here are:

  •  The cost in memory of constantly waking up the EDS process to handle a query
  • The cost of server side heap allocations used to deliver the results over D-Bus
  • The cost of client side heap allocations used to receive results over D-Bus

Overall Memory Consumption differences

In summary, we can conclude that after all measures taken to improve performance of contact fetches in EDS; the Direct Read Access mode is the single element which makes a tradeoff in terms of memory consumption versus speed.

Without the Direct Read Access patches, memory consumption as well as time to fetch contacts has seen a net improvement. With Direct Read Access enabled we see that for smaller address books an additional memory overhead is required, while with larger addressbooks (larger than 3,200 contacts); overall resident memory usage has seen a significant improvement as well.

 

January 30, 2013
Murray Cumming - January 30, 2013 - 08:49

I managed to finish a book for the first time in 2 years. My second child is now 2 years old, in case you find it odd that I’m proud of having read an actual book. Parents of small children will understand.

It took several tries to finish The Quantum Universe, partly because I had to go back over several of the earlier explanations before moving forward. I’m a big fan of Brian Cox’s TV shows, but not so much a fan of the co-author’s Jeff Forshaw’s articles in the Guardian, though I find most popular science journalism rather insubstantial.

In the end, I had to accept that several explanations in the book just weren’t good enough. It seems to have been rushed, probably to capitalize on Brian Cox’s current fame, without anybody taking the time to check that all the text actually makes sense. It must be hard to find editors who are brave enough to say when they don’t understand.

It also got unnecessarily hand-wavy at times. And I didn’t like how on the one hand it rightly discussed observed experimental behavior of the quantum world as being predictable via the maths, but then went on to describe that maths as being a mechanism that causes other effects rather than just being consistent with the observed effects. That feels like anthropomorphizing the maths, before we’ve really figured out if we have the most general model to think about things. I’m thinking particularly of the epilogue about how Pauli’s Exclusion Principle explains how electrons limit the mass of white dwarf starts. Maybe I’d find it more justified if there was some proper explanation of the exclusion principle, assuming there is one, but the book skips over that.

Like many people, I found the clocks analogy to be a distraction. I’d rather just see the maths used to explain probability waves and their wave interference. Plenty of people are afraid of maths, but the clocks explanation mostly just gives the false impression of understanding, while annoying people who are comfortable with a little calculus. Without the equations, I don’t feel like I’m on solid ground.

Reading it on the Kindle wasn’t particularly pleasant, because I had to flick backwards and forwards between the text and the diagrams. But as far as I can tell, the real book is not laid out much better. Again, I guess the book would have a more approachable layout if it wasn’t rushed.

As you can tell, I’m not qualified to judge this book properly. That’s because I am the target audience. I still feel a lack of understanding of quantum and particle physics that bothers me deep down. I hope to find the time to read Ramamurti Shankar’s Principles of Quantum Mechanics instead, to solve the problem properly. I’ve already enjoyed some of his Fundamentals of Physics lecture videos that Yale have made available online.

January 29, 2013
Murray Cumming - January 29, 2013 - 13:08

The OnlineGlom demo does not require a login. However, the code does let you set up a server that requires a login, and I noticed that a successful login for one person became a login for everybody else. So after the first login, it was as if no login was required for anybody. Yes, really. Of course, this would not do.

So I fixed that, I think, learning some things about Java Servlet sessions along the way. This text is mostly for my own reference, and so that people can tell me how wrong I am, because I’d like to know about that.

In the server-side code

Java servlets already set a JSESSIONID cookie in the browser, but you shouldn’t try to use that cookie to maintain a login across browser sessions. Instead, I now set and get a custom Cookie, in the server code, using javax.servlet.http.Cookie. HttpSession.getId() conveniently provides a unique-enough session ID for me to use in the Cookie. This page about Cookies with GWT seems to suggest setting the cookie in the client-side JavaScript code, using com.google.gwt.user.client.Cookies, but that sounds rather wrong.

I now store the username and password (Yes, that’s not good, so keep reading), associated with the session ID, in a structure that’s associated with the ServletContext, via the javax.servlet.ServletContext.setAttribute() method. I get the ServletContext via the ServletConfig.getServletContext() method. I believe that this single instance is available to the entire web “app”, and it seems to work across my various servlets. For instance, if I login to view a regular page, the images servlet can also then provide images to show in the page. I’d really like to know if this is not the right thing to do.

However, it still stores your PostgreSQL username and password in memory, so it can use it again if you have the cookie from your last successful login. It does not store the password on disk, but that is still not good, because it could presumably still allow someone to steal all the passwords after a breakin, which would then endanger users who use the same password on other website. I cannot easily avoid this because it’s the PostgreSQL username and password that I’m using for login. PostgreSQL does store a hash rather than the plaintext password, but still requires the plaintext password to be supplied to it. I think I’ll have to generate PostgreSQL passwords and hide them behind a separate login username/password. Those passwords will still be stored in plaintext, but we won’t be storing the password entered by the user. I’d like to make this generic enough that I can use other authentication systems, such as Google’s for App Engine.

To avoid session hijacking, I made the cookie “secure”, meaning that it may only be provided via a secure protocol, such as HTTP. I believe this also means that client (javascript) code is not allowed to read it, so it can only be read by the server via HTTP(S). I did that with the javax.servlet.http.Cookie.setSecure() method, though I had to make a build change to make that available.

The login servlet now checks that it has been called via HTTPS, by using the ServletRequest.isSecure() method, and uses HTTPS when testing via mvn gwt:run. It refuses to do any authentication if HTTPS was not used, logging an error on the server.

In the client side code

I added a check for what protocol is used. If it’s not https then I warn that login cannot work. This does not add any security, but it’s a helpful hint.

Actually, the entire site must therefore be served via HTTPS, not just the login page, or we would violate the Same Origin Policy by mixing protocols, which the browser would rightfully complain about. At this point I noticed that most serious sites with logins now use HTTPS for their entire site. For instance, Google, Amazon, Facebook. This seems like a good simple rule, though I wonder if many projects don’t enforce it just to make debugging easier.

I also converted the popup login dialog into a proper login page, making sure that it takes the user to the desired page afterwards.

January 21, 2013
Jens Georg - January 21, 2013 - 09:46

It’s been a while since I last blogged about Rygel. Many things have happened since, mainly in features and documentation.

Features

  • Exchangeable media engines: We’ve loosened the dependency on GStreamer a bit. While it is still our first-class transcoding and general media handling library, it is now possible to substitute it with other media processing libraries. A simple example is included in the source.
  • Change tracking. This is a feature introduced in the UPnP content directory specification version 3. It allows clients to, well, track the changes that happen on the server in detail for synchronization purposes. It’s implemented in the framework and as a demonstration in the MediaExport plug-in.
  • GStreamer 1.0 support. As the rest of GNOME, we transitioned to GStreamer 1.0.
  • Playlist support. Rygel now generates playlists for containers on-the-fly and the renderer framework supports automatic playback of them. The only format that’s supported currently is one of the two formats defined by DLNA, DIDL_S, which is just the same format that is used by UPnP AV to describe the media content on a server.
  • Playspeed support. A renderer now can announce that it supports different speeds and directions than normal forward playback.

API

There was another split-up into a renderer framework library and a specific implementation of a renderer using GStreamer which again may be used in your own programs. This is mainly due to the aforementioned change in media backend flexibility.

Otherwise we’re working on making the API easier to use from C and other languages through introspection.

Documentation

There’s been a lot of effort into extending our sparse documentation. It is currently concentrating on the API side of things but will be extended to a higher level as well soon.

Misc

  • There is an example now that implements a DLNA renderer which is running in full-screen
  • There are several examples for the most common init systems to run Rygel as a system service if wanted
  • A load of bugfixes
Tristan Van Berkom - January 21, 2013 - 03:43

I’m posting this here, while I would have replied to Taryn Fox’s blog but couldn’t do it without subscribing to something….

(I’m throwing away all of the text I wrote yesterday and starting over, I’ll instead try to write something shorter).

First and foremost, please remember that GNOME projects are indeed mostly volunteer driven, except for a few projects in GNOME which may be dominated at times by developers all working at a given company (and in those corner cases, the meritocracy approach may not apply as strictly).

In most cases, the maintainer is the only one that actually cares about the given project enough to weather the storm. Example, if I had not been so determined to make something out of Glade for a number of years in my spare time… believe me that the project would have died, in the same way that if Juan Pablo did not take care of Glade these last couple years, nobody else would have taken charge for the long term. I know this because I see the flood of contributors who come and go, the ones who stay the course and show dedication are few and far between. It’s only fair that we afford a special level of trust to those who work hard and stay the course.

Yes there are things that can be improved, hopefully we can all take criticism and try not to hurt people’s feelings etc etc, but please consider the cruel alternatives to meritocracy.

The alternative to meritocracy as I see it are those “Pay to get in Boys Clubs”, what I mean by “Boys Club” is you know… those people who’s daddy was rich or knew the right people, and so were able to go to the most reputable universities and have all the opportunities that others did not. Now let me stress that not all members of these clubs have an arrogant sense of self entitlement, however sadly some of them do in my experience, also most corporate human resource departments are unconditionally biased to hire only people who hold some kind of university degree (or even, those who hold a degree from a first world country).

Meritocracy helps us to level the playing field, it gives a chance to those of us who grew up in a cardboard box or in a third world country, to prove that they can indeed make just as worthy contributions as those of us who attended one of these rich kid clubs/universities and also get the same recognition, provided they at least did their homework (whether the walls of that home were made of brick, wood, or only cardboard).

This is something worth fighting for, worth protecting.

January 15, 2013
Murray Cumming - January 15, 2013 - 12:09

Glom uses PostgreSQL and doesn’t try to offer the user a choice of anything else. That’s because it does what Glom needs, there’s no need to confound the user with an incomprehensible choice, and I’ve no wish to maintain multiple sets of code. It’s hard enough keeping up with changes in PostgreSQL, though Glom’s regression tests help.

However, I played around with adding MySQL support as a build-time alternative via the –enable-mysql configure option. The basic stuff now works both in the UI and in the regression tests. Those tests can now run each self-hosting database test with all 3 backends.

This is mostly just so I could learn about MySQL, so I can reimplement it in Java for OnlineGlom. That would let me use Google’s Cloud SQL, which is based on MySQL. The main work has been figuring out how to initialize a MySQL database store on disk and then start and stop MySQL instances. It’s even more funky than with PostgreSQL. I did need an addition to libgda to support non-standard MySQL port numbers but, as usual, Vivien Malerba fixed that for me quickly. There’s also the rather huge problem that AppArmor on Ubuntu prevents us from starting MySQL with anything but the standard database data, and we can’t expect the user to go editing AppArmor config files. At least with MySQL 5.6, it should be possible to start a MySQL instance without it having no password for a few seconds, as is necessary with MySQL 5.5. I need to start and stop custom instances so I can run tests automatically.

I’ve committed it to Glom’s git master, and its in the 1.23.3 release, just in case anyone wants to improve it. There are some TODO_MySQL comments in the tests where we expect something to fail with MySQL. For instance, I have not added support for editing the MySQL users and groups. And there are likely to be problems with keeping data when changing field types, which doesn’t seem to be tested thoroughly for any backend. libgda is also missing some support for binary field types, needed for Glom’s image fields.

Over the years, various people have complained about Glom not using MySQL. Here is your chance to actually work on that, with tests to show if your work is enough.

Tristan Van Berkom - January 15, 2013 - 10:27

Hi all, hope you’ve spent a pleasant holiday season.

As promised, here is another post describing what new tricks we’ve been teaching EDS (Evolution Data Server) this year at Openismus.

Before I go through all the details, a little context is in order. Last year Mathias created a nifty benchmark tool for EDS allowing us to track performance improvements and regressions of the Evolution Data Server across releases and branches. Mathias, with his prior experience and knowledge of EDS was able to make some educated guesses on where we could save some milliseconds, all in the interest of providing an EDS that is stable/reliable in terms of performance and also useful in a variety of platforms and scenarios (not only as the backend of the Evolution Mail client on Desktops).

We’ve come a long way on this, so first let me describe the major changes that we’ve made and then I’ll move on to show you the results.

Removing Berkeley DB

Historically, Evolution’s local addressbook used Berkeley DB to store VCards for all contacts. Over time some optimizations were made, originally an in-memory “summary” was maintained holding some of the contact data in order to speed up queries to the addressbook. This was eventually replaced with an SQLite implementation of the quick search “summary” data. In the end that left us with a two step query for fetching contacts from the addressbook; an initial query to the SQLite to find any UID which match the query terms and then another query to the BDB fetching the actual VCard data for that contact.

Removing the BDB implementation and storing all contact data in the SQLite instead naturally makes queries faster, not to mention there is considerably less flash wear as we only have one DB persisting contact data now instead of two. Additionally we’ve also observed that the old BDB code fails (crashes, even) with an out of memory condition in some cases such as deleting more than 6400 contacts at once, this is all handled much cleaner using SQLite exclusively.

This has landed in EDS’s git master a couple of months ago and should be available in the next release.

Configurable Summary Fields

Summary Fields in EDS refer to the VCard fields for a given addressbook which should be elected for fast results. They are stored separately in an SQLite table so that contacts can be queried without parsing the complete contact VCard data for every contact.

This list has always been hard coded and tailored to the needs of the Evolution Mail client (the list basically consisted of the contact name fields plus a hand full of email fields which Evolution is accustomed to using). This would of course be appropriate for an email client but falls short for applications that have different needs such as hand phones, which require extremely fast results for queries by phone number.

So we’ve now introduced a set of APIs which allow configuration of the summary fields of a given addressbook at addressbook creation time. This allows us to choose which fields are stored in the summary and which of those fields should be indexed for extra fast retrieval.

As a side effect of this, we now also support multi valued VCard attributes to be stored in the summary (i.e. list of emails or list of addresses).

This has also landed in EDS master some time ago and will be available in the next EDS stable release.

Direct Read Access

One of the more intrusive changes is the Direct Read Access mode. Mathias foresaw that we would gain significant performance simply by delivering query results directly to the client instead of squishing them into the socket and pushing them arduously through the D-Bus byte by byte (or probably 8192 bytes at a time…). I have to admit that I was a little sceptical about this change but after benchmarking the direct read access approach I was able to notice a serious performance gain.

Our fastest queries using the previously described configurable summary fields return in roughly 4-7 milliseconds.

The same queries in Direct Read Access mode are quite consistently 0.2 or 0.3 milliseconds.

In other words; for the simplest queries where the EDS server can fetch the results very fast, we waste the grand majority of our time serializing/deserializing VCard data and tinkering on the D-Bus socket.

This has not yet landed in EDS master, so I’ll keep you posted ;-)

How fast can I get my contacts ?

In conclusion, let’s go over the results of our benchmarks and compare.

First a few details regarding the results we’re looking at:

  • EDS 3.6 – This is stable EDS 3.6.2 without any of the above modifications, it’s important to note that this version still uses Berkeley DB as well as SQLite to store contacts. Furthermore, the stock 3.6.2 does not take advantage of SQLite indexes.
  • Custom – Built from our EDS 3.6 based work branch (called ‘openismus-work’) this build has Berkeley DB removed and configures the summary with a custom summary configuration.
  • Custom DRA – Built from our EDS 3.6 based work branch; this build additionally enables Direct Read Access, using the same configuration for summary fields as the ‘Custom’ build uses.

For both ‘Custom’ and ‘Custom DRA’, the customized summary is configured as follows:

  • Full Name: In the summary and indexed for prefix searches
  • Given Name: In the summary and indexed for prefix searches
  • Family Name: In the summary and indexed for prefix searches
  • Telephone Number: In the summary and indexed for prefix & suffix searches

Query for exact match of the full name attribute

In EDS 3.6 stable, the full name attribute is indeed stored in the SQLite summary. Notice that for small addressbooks (less than 200 contacts) the results are similar to EDS Custom. However with the customized summary fields we’ve also ensured that the SQLite indexes are getting used properly; this is what ensures the performance doesn’t degrade too much with larger addressbooks.

The red (DRA) line is EDS doing the same thing but avoiding the arduous tinkering with D-Bus messaging.

Phone Numbers

Query for prefix match of phone number

Since we’ve configured EDS to optimize the phone number field of a given contact for prefix & suffix searches, we can now use phone number queries at reliable speed. In other words you can again use EDS on your hand phone to implement your contacts database and kittens wont be sacrificed.

The reason why this takes an extremely long time with EDS 3.6 is that the contacts have no quick search information ready at hand, this means we must iterate through all of the contacts in the Berkeley DB and parse the VCards for each, extract all of the phone numbers and compare one by one.

Since we optimized for suffix searches, we get similar results for a suffix search:

Query for suffix match of phone number

Memory Usage

We also have in place some monitoring of memory usage:

Virtual Memory Usage

 

Resident Memory Usage

These are basically just memory usage snapshots taken over the course that the benchmarks run.

The “Custom” setup takes slightly more memory to run than EDS 3.6, this is presumably because we maintain more indexes with the SQLite version. Unsurprisingly, the Direct Read Access mode takes significantly more memory; this is because the direct read access mode uses two SQLite connections (one for the server and one for the client to make direct read calls).

This concludes this Tuesday’s episode of “Getting your contacts Right Now!”

Stay Tuned ;-)

January 10, 2013
Jan Arne Petersen - January 10, 2013 - 11:29

We at Openismus combined our experiences in the Maliit input method framework and Wayland input methods and added support for Wayland input methods to Maliit. This allows using Maliit as a virtual keyboard under Weston/Wayland beside the demo weston-keyboard. To try it out use maliit-framework from master and Weston from our github repository and compile it with Qt 5 and qmake CONFIG+=wayland, maliit-plugins (from master) needs to be build with qmake CONFIG+=disable-nemo-keyboard.

screen cast

There is no EFL support in Maliit (and no Maliit support in EFL) but since there is wayland input method support for EFL available one can just use Maliit with EFL applications under Wayland. This shows one of the advantages of the Wayland input methods, that input method frameworks do not need explicit support for all kind of UI toolkits anymore.

The current plan is to get input method support into Wayland 1.2. Adding support for Wayland input method to Maliit allowed us to discover some missing features which we will work on next together with some more improvement suggestions to get the input methods really ready for Wayland 1.2.

January 04, 2013
Murray Cumming - January 04, 2013 - 09:01

Our Ubuntu packages for the Maliit framework and keyboard had not been updated for a while, so just before the holidays I uploaded 0.93.1 versions for Ubuntu Quantal to the Maliit PPA. I fixed various lintian warnings along the way.

We’d really like some help getting this into official Debian and Ubuntu.

December 21, 2012
Mathias Hasselmann - December 21, 2012 - 07:24

This year at Openismus had a strong focus on media files and related metadata. So it isn't surprising that we also got in touch with Grilo. Grilo is a framework for discovering media files and decorating it with relevant metadata. One missing piece was background information about movies, like story summaries, artwork, information about artists. With the support of our customers we decided to change that.

Now an obvious choice for getting information on movies is Amazon's incredibly comprehensive Internet Movie Database. Sadly there doesn't seem to be any official web API that could be used by open source software. If I missed something, please tell me in the comments.

The Movie Database plugin for Grilo

Anyway, with IMDb out of reach we had to look for an alternative - and we've found a great one: The Movie Database. This database is a community driven project, built with the needs of media players in mind. It comes with a great and easy to use API. Like IMDb it seems to know just each and every relevant and irrelevant movie ever made. Some entertaining stuff like movie trivia, goofs and quotes are missing, but for most movies the database knows the IMDb id - so all the entertainment is just one click away.

Now that we've found a good movie database Jens started implementing a TMDb plugin for Grilo, and a first version of the plugin was released with grilo-plugins 0.2.2. I've later finished it while integrating it with our customer's project. Murray took care of examples and documentation.

The plugin implements a metadata resolver: It takes a media object, checks for a title or the unique movie id and then fills the media object with information it found on TMDb. Biggest obstacle when using the plugin is the need for an API key. It should be easy to get. The database operator hope these keys will help them to deal with misbehaving clients.

Another issue is related to Grilo's architecture. Grilo doesn't permit its sources to improve any metadata. So if you interpolated a movie title from the filename and pass a media object with that title to the TMDb plugin, the plugin is not permitted to replace your usually ugly title with the pretty and official title it found in the database. To work around that issue you would not resolve the TMDb data in a single step. Instead you'd do a first resolve operation to just retrieve the TMDb's movie id. In a second step you'd delete your own title, but keep the resolved movie id and let the plugin work with that data:

/* Create dummy media from filename */
media = grl_media_new ();
grl_media_set_title (media, "the_last_unicorn_720p");

/* Ask grilo to resolve the TMDb id */
metadata_key_tmdb_id = grl_registry_lookup_metadata_key (registry, "tmdb-id");
keys = grl_metadata_key_list_new (metadata_key_tmdb_id, GRL_METADATA_KEY_INVALID);
grl_source_resolve_sync (source, media, keys, options, &error);
/* Exercise: Release memory and handle possible errors */

/* Use the resolved id to get full metadata */
movie_id = grl_data_get_string (GRL_DATA (media), metadata_key_tmdb_id);

if (movie_id) {
    grl_data_remove (GRL_DATA (media), GRL_METADATA_KEY_TITLE);
    grl_source_resolve_sync (source, media, keys, options, &error);
    /* ... */
}

Behind your back the same number of network requests is needed for both approaches.

Testing the Plugin

At Openismus we have a mantra we strongly believe in:

No code is done before it has proper tests.

Or more radically:

It doesn't work if it isn't tested.

In the case of the TMDb plugin this commitment caused us a bit of work. Grilo didn't have any infrastructure for testing network based plugins yet. Also if we'd do those tests we'd like to run them offline:

  • it works without API keys
  • it saves resources and is much quicker
  • it permits testing of obscure errors
  • it is more reliable by removing countless points of failure

So we took the challenge and also implemented a mock testing framework for Grilo. It became handy that Grilo already routes all its network access through a dedicated abstraction, so that it can implement request throttling. So we just hooked into that layer and introduced a few environment variables that influence behavior of that layer: First you'd set GRL_NET_CAPTURE_DIR to some folder name and then let your plugin perform a few interesting operations. In a next step you'd edit the recorded files and rename them nicely to suite your needs. Essential in that work is a generated key-value file of the name grl-net-mock-data-$PID.ini which maps URLs to captured responses. Once done you can set GRL_NET_MOCKED to point to that file. Grilo will then stop doing real network requests and answer all requests with the information this file provides.

Details about testing with the Grilo mock framework are in the reference manual. A few examples for using the framework can be found in Grilo's test folder.

December 20, 2012
Mathias Hasselmann - December 20, 2012 - 09:10

Earlier this year we at Openismus proposed a Qt based project that would utilize GStreamer for handling media files. Especially we were interested in using the GstDiscoverer class which provides a really nice and easy to use API for discovering properties of media files, such as the container format and the audio and video formats, but also more interesting things like EXIF information, when used with photos.

Now combining code from different worlds with their different paradigms isn't exactly fun. The resulting code often is a disgusting Frankenstein monster not fitting at any place, unless you wrap one of the libraries to match the project's preferred code style. Luckily in the case of Qt and GStreamer Collabora's George Kiagiadakis created QtGStreamer and therefore did most of the hard work already. Still that library didn't support our beloved GstDiscoverer class yet. So we had the choice: Use something different, or wrap that thing. Now we love doing free software, also we use GstDiscoverer with great success in the Rygel UPnP AV/DLNA Media Server already, and in the end the media files shall get played via GStreamer in the end. So we decided to just wrap that class for QtGStreamer.

Doing that work actually was surprisingly easy: A few loose ends here (#680235), a bit of nitpicking there (#680233, #GB680237). Biggest effort was doing the regression tests. This tests also demonstrate how easy the wrapped GstDiscoverer is to use. Synchronous media discovery is done like that:

QGst::DiscovererPtr discoverer = QGst::Discoverer::create(QGst::ClockTime::fromSeconds(1));
QGst::DiscovererInfoPtr info;

try {
    info = discoverer->discoverUri("file:///home/mathias/blockbuster.ogv");
} catch(const QGlib::Error &error) {
    qWarning("Discovery failed: %s", qPrintable(error.message()));
    // ...maybe also check error.domain() and .code()
}

You also can try asynchronous discovery if you have a Qt build that integrates GMainLoop:

QGst::DiscovererPtr discoverer = QGst::Discoverer::create(QGst::ClockTime::fromSeconds(1));

// Connect C++ member methods to the signals
QGlib::connect(discoverer, "starting", this, &DiscovererTest::onStartingDiscovery);
QGlib::connect(discoverer, "discovered", this, &DiscovererTest::onUriDiscovered);
QGlib::connect(discoverer, "finished", this, &DiscovererTest::onDiscoveryFinished, QGlib::PassSender);

discoverer->start();

QEventLoop loop;
loop.exec();

Usually only X11 builds match that requirement, but it should be possible to just hook QEventDispatcherGlib into your own application if needed.

The discovered data is accessible by the various attributes and methods of QGst::DiscovererInfo:

QGst::DiscovererInfoPtr info = ...;

qDebug() << info->uri();
qDebug() << info->tags();
qDebug() << info->duration();
// ...

Q_FOREACH(const QGst::DiscovererVideoInfoPtr &info, info->videoStreams()) {
    ...
}

Sadly our customer wasn't that much a fan of Qt as we thought, so we didn't have much use of our own for this work yet. This situation also delayed finishing the last few bits of that patches. Luckily Murray just took the time recently to do that last bits of work, and to get the patches merged. The code is in the git repository now and should get released with QtGStreamer 0.10.3. So whenever your Qt application needs to discover media file properties you also can use QtGStreamer now.

Tristan Van Berkom - December 20, 2012 - 08:35

Hi all, long time no blog…

At Openismus these past months we’ve been making a series of improvements for the Evolution Data Server, again.

As it goes with patch reviews, better to take on one at a time, so this blog post will focus on isolated unit testing of your D-Bus services.

If you work on the implementation of D-Bus services you’ll be happy to know about the new GTestDBus object introduced in GIO since 2.34. Thanks to this nifty new object we can now perform unit tests on D-Bus services in a completely isolated fashion.

Using GTestDBus to implement a new test fixture for Evolution Data Server (EDS) now makes it possible to run ‘make check’ for EDS without running the actual EDS servers manually, and without even installing the EDS software into the target prefix before hand. With a little more work, we can even get regular distchecks passing for EDS again.

Here’s how:

  • First of all, you need to have an in-tree directory holding your D-Bus .service description files. Typically you already have the .service.in files stored somewhere in tree to be installed somewhere into the install prefix, but you need a separate one (I think it’s good practice to add a subdirectory to your ‘tests/’ directory, so it becomes ‘$(top_builddir)/tests/services’)
  • The contents of your separate .service.in files for testing should point to the in-tree location of your D-Bus service, typically it would look like this:
    [D-BUS Service]
    Name=org.gnome.MyProject.MyServiceName
    Exec=@abs_top_builddir@/src/my-server-name

    This will define a service which explicitly activates a server which you’ve built in your source tree.

  • Then, you just need to provide that in-tree service directory to g_test_dbus_add_service_dir(), typically you would put the following into your tests/Makefile.am:
    -DTEST_SERVICE_DIRECTORY=\""$(abs_top_builddir)/tests/services"\"
  • Finally you just need a basic test fixture, with this test fixture you can ensure that your temporary D-Bus session along with your test service is recreated for every test in the suite:
    typedef struct {
      GTestDBus *dbus;
      MyProxyType *proxy;
    } TestFixture;
    
    static void
    fixture_setup (TestFixture *fixture, gconstpointer unused)
    {
      /* Create a private dbus-daemon for this test */
      fixture->dbus = g_test_dbus_new (G_TEST_DBUS_NONE);
    
      /* Add the private directory with our in-tree service files */
      g_test_dbus_add_service_dir (fixture->dbus, TEST_SERVICE_DIRECTORY);
    
      /* Start the private D-Bus daemon */
      g_test_dbus_up (fixture->dbus);
    
      /* Get a proxy which we will be using in test cases, typically
       * this is some API generated by gdbus-codegen
       */
      fixture->proxy = my_generated_code_new_for_bus_sync (...);
    }
    
    static void
    fixture_teardown (TestFixture *fixture, gconstpointer unused)
    {
      /* Destroy the proxy */
      g_object_unref (fixture->proxy);
    
      /* Stop the private D-Bus daemon */
      g_test_dbus_down (fixture->dbus);
      g_object_unref (fixture->dbus);
    }

With this basic type of fixture, you can produce a series of tests using g_test_add() in the normal way, all testing various behaviours of fixture->proxy.

Of course, perfect isolation of your service’s test cases may need more efforts than just the above, for instance in the EDS test cases we create and delete a data directory between each test and setup the XDG_CONFIG_HOME, XDG_DATA_HOME and XDG_CACHE_HOME enviornment variables to point to that directory (avoiding any interaction with the user’s addressbook & calendar). We also compile a gsettings schema in the local data directory and setup GSETTINGS_SCHEMA_DIR to point to the in-tree directory (avoiding any need to have the EDS settings schemas already installed).

This morning before writing this post I also provided a simple patch to GIO adding an example of this testing paradigm.

 

December 10, 2012
Murray Cumming - December 10, 2012 - 11:38

Over the last couple of weeks, I’ve been playing with a Jenkins installation at jenkins.openismus.com, building some of the Openismus projects. Here are some notes about my experience.

Installation

This runs on an Amazon EC2 instance. Initial installation was surprisingly simple and well documented, though it took me a while to figure out how to use Jenkins properly. I initially used the official Ubuntu 12.10 packages for Jenkins, but they are a little old so I had to switch to using the Debian/Ubuntu packages from jenkins.org to fix a bug with the copyArtifacts plugin. The two packages seem to be structured very differently, so I had to remove all the Ubuntu Jenkins packages before installing the jenkins.org packages, to avoid a conflict.

See also the Jenkins standard security setup instructions, though I had to use the “Jenkins own user database -> Allow users to sign up” feature first, to create a user which I could then enter in to the matrix grid. I then disabled the “Allow users to sign up” checkbox.

Although Jenkins can use slave servers, and probably should, I’m doing everything on one server for now, because I’m afraid of the Amazon EC2 costs getting out of control. Luckily we don’t need to run each build more than once or twice per day to get some benefit. Later I will probably try running EC2 spot instances for the builds. Maybe that won’t be too expensive.

Git-based projects with Jenkins

You’ll need to use the pluginManager page to install the Git plugin, so that there is something other than “None” listed under “Source Code Control” when creating a job. Of course, we have to “apt-get install git” too. We must also specify a git username and email address for the “git plugin” on the configure page, to avoid “Please tell me who you are” errors in the job when Jenkins tries to locally tag the checked out git repository. Neither the configure page or the pluginManager admin pages seem to be linked from anywhere, so I had to discover them via google searches.

Be careful to specify the master branch rather than leaving that blank, or I think Jenkins will try building arbitrary branches, and maybe all of them.

You can specify a “git clean -dfx” via the “Clean after checkout” option under the “advanced” section, and you probably should so you get a truly clean build each time.

You can use the “Poll SCM” Build trigger, with the cronjob syntax, to regularly check the git repository for changes. This is not ideal, but to do it properly you’d need to add a git hook to the git repository to request a build from your jenkins server whenever there is a git commit.

Multiple branches

You can specify more than one git branch, to make Jenkins try building more than just one, but it’s hard see what branch was built when looking at the results.

Simple maven builds

For maven-based Java projects such as OnlineGlom, Jenkins is very straightforward, because maven typically downloads all the dependencies without expecting anything to be installed already, and “maven package” typically does the whole build.

Autotools builds, or similar

For autotools-based projects, you’ll need to make sure that you’ve “apt-get install”ed the project’s dependencies.

Then you must specify the configure and make (or qmake) steps in a build step.

Of course, many real-world projects will need newer versions of their dependencies. For instance, we build maliit-plugins, which depends on maliit-framework, which we develop in sync. For this, I:

  • Tell the maliit-framework job’s build to “make install” into a local directory, via the “–prefix=” configure option.
  • Use an “archive the artifacts” post-build action to store everything in that directory.
  • Use a “copy artifacts from another project” build step in the maliit-plugins job.
  • Export several variables so the build system has access to the dependency in the local prefix. For instance, (maliit uses hateful awful qtmake instead of autotools, but you’d need something similar for autotools):
    export MALIIT_FW_PREFIX=$WORKSPACE/build_install
    export C_INCLUDE_PATH=$MALIIT_FW_PREFIX/include:$C_INCLUDE_PATH
    export CPLUS_INCLUDE_PATH=$MALIIT_FW_PREFIX/include:$CPLUS_INCLUDE_PATH
    export PKG_CONFIG_PATH=$MALIIT_FW_PREFIX/lib/pkgconfig:$PKG_CONFIG_PATH
    export LD_LIBRARY_PATH=$MALIIT_FW_PREFIX/lib:$LD_LIBRARY_PATH
    export XDG_DATA_DIRS=$MALIIT_FW_PREFIX/share:$XDG_DATA_DIRS
    export XDG_CONFIG_DIRS=$MALIIT_FW_PREFIX/etc/xdg:$XDG_CONFIG_DIRS
    export QMAKEFEATURES=$MALIIT_FW_PREFIX/share/qt4/mkspecs/features/:$QMAKEFEATURES

You can use the “” build trigger to make Jenkins try a build whenever a dependency is built.

I have not tried this with multiple built dependencies. I imagine it could get awkard. It feels like Jenkins needs a plugin for autotools to make this simpler.

Multiple configurations

You can create “multiple configuration” jobs to try multiple ways of building your project. For instance, you might provide different sets of options to your configure script. But I couldn’t use this feature due to the spaces that it puts in the build paths. So I created separate top-level jobs for each configuration. Other people seem to do the same, maybe for the same reason.

Email notifcation

I’ve tried using Amazon’s Simple Email Service to send notification emails about build failures, but I don’t have that working yet. I’ll update this if I do.

November 29, 2012
Murray Cumming - November 29, 2012 - 11:53

I tried a couple of applications with my four-year-old son: Pictomir and MIT’s Scratch.

Neither seem particularly well maintained and neither are suitable for young children without lots of supervision. Pictomir is easier to get started with, but not easy enough. Scratch is probably more interesting to older children, though they’ll partly be learning about how software is still so often arbitrarily annoying, and I’d prefer that they were introduced via a better example. I’m very tempted to write something better.

Pictomir

pictomir

Pictomir at first start up.

Pictomir leads the child through a series of levels, telling an R2D2 robot (Don’t tell George Lucas) to move around some isomorphic squares to paint some tiles. At the beginning the program exists and the child just needs to click the play button. In subsequent stages the child has to build part of the program himself, and then all of the program.

The available commands, at least at the start, are icons for left-turn, forwards, right-turn and paint at the top of the screen. These may be be dragged into the available boxes in the program at the right of the screen, though you can instead put them there by clicking to select and then clicking on the empty box in the program. However, the available commands are far away from the program where you must place them, so the child has to spend lots of time moving the mouse pointer back and forth across a wasteland of empty space.

I found an official Ubuntu package in the developer’s PPA. It was last updated to Ubuntu 10.10 (Maverick), but you can install it manually on later versions. All of its UI is in Russian, though you can ignore most of it because the UI is icon-based. When I built the newer code from source a few months ago, I think the UI was translated.

I guess that Pictomir was developed for a specific screen size and generally older PCs. It does not scale the playing area up, so the child has to deal with a tiny R2D2 on a tiny grid in the middle of wide empty space.

pictomir has a bare svn repository, though I can’t see how clean its commits are, and I don’t know if it’s still used. There are no commits since last year. My svn says that the svn format is too old to check it out, though I’ve checked it on on previous Ubuntu installations.

It’s website it quite awful (and only in Russian). I only discovered Pictomir thanks to commenters on Google+.

Scratch

The Scratch UI has several problems that make life hard for the child:

scratch

Scratch, after creating and running a small program.

  • Scratch requires the child to cope with tiny targets (See Fitt’s Law)
  • Scratch demands the use of drag and drop. This is frustrating for normal users, let alone children, and really hard with some laptop touchpads or trackpoints.
  • Update: Scratch requires the child to understand the difference between left-click and right-click. Right-click brings up a help context menu, which just causes confusion.
  • Update: The tiny commands have tiny text edit boxes. Children have a high chance of clicking it instead of clicking the command block itself. This problem could be partly solved by accepting drags on the edit box.
  • Scratch requires the child to double-click (see below).
  • The commands require the children to read, but a subset of the commands could instead be respresented by icons. The words are particularly hard for beginner readers because they are written in such a tiny font.
  • To get started with scratch, you need to figure out that you need to double-click on the first program command to actually start your program. There is no simple run button. You can cause your program to be run by clicking the green Flag button, but that’s something you need to put in your program.
  • When you try to move a command around in the program, it moves the command and all subsequent commands, as a group. So moving just one command means moving stuff around and then moving some of the stuff back to where you started.
  • Your program moves the Scratch cat sprite around the canvas, but that canvas is tiny, and the sprite can move off screen. The only way I found to get it back was to double-click the set-x-to and set-y-to program commands. I can figure that out, but beginners will not.
  • When you’ve drawn all over your canvas, the only way to wipe it is to go to the Pen set of commands and double-click the Clear command.
  • The sprite moves far too fast for the child to understand the relationship between the commands and the motion, without someone explaining it. If the command says move-10-steps then I’d expect the sprite to visibly move 10 steps. Advanced users might prefer it to be faster and smoother, but the default should make things obvious.

There are some other strange things in the UI which are probably a side-effect of its eccentric (Smalltalk) implementation.

  • Scratch has a menu (well, an icon-menu) for changing its language, instead of just using the user’s regular language.
  • None of the UI elements (menus, tabs, buttons, file chooser dialog, etc) match other standard applications on the system.

I think Scratch could be really popular with younger kids if it ran on Android with a touch interface and with less text, maybe as a simpler version. But the developers don’t seem to have any interest in that – they seem to be working on a Flash-based version of Scratch for the web, which is unlikely to work well with touchscreens, even if they can get Flash to work on Android and iPhone (they won’t). And I have no interest in hacking on Smalltalk. There seem to be various programming-language-based forks of at least the underlying engine, but none seem to be successful.

 

November 22, 2012
Murray Cumming - November 22, 2012 - 12:15

I found some work in one of my old branches and cleaned it up, so now OnlineGlom supports image fields too.

Online Glom with an image field

As usual, it was far more work than seemed necessary. GWT’s Image widget is not much more than a wrapper around the HTML <img> tag, so I had to create a separate service, with the same authentication system, to serve image data and invent a URL syntax to refer to the images from the database. It is certainly easier with GTK+ code on the desktop, even when delivering the image data asynchronously. This feels like something that a web progamming system should take care of, even if this is what happens behind the scenes. I wonder if any do.

Next, I want to make sure that OnlineGlom can handle tables whose primary keys are not numeric, because we’ve been hard-coding that in a few places. Then I hope I can start the big job of supporting data editing.

November 08, 2012
Murray Cumming - November 08, 2012 - 11:14

I have not had enough time recently for my maintainership of gtkmm, glibmm, libxml++ and libsigc++. And I’m unlikely to find time soon. Family and (income earning) work have taken priority. That’s why glibmm and gtkmm didn’t have .0 releases until a couple of weeks after the GNOME .0 release, and why there was no real attempt to follow the GNOME release schedule this time. I’ve hardly touched cairomm for years.

It’s not a lot of work but I’ve had to find a few days every few months to keep glibmm and gtkmm up to date with API additions in glib and GTK+, to keep roughly in sync with them. There has been rather a lot of new API recently. See the list of new API in gtkmm 3.6, glibmm 2.34 , gtkmm 3.4, and glibmm 2.32. Krzesimir’s ongoing attempt to use the introspection information (rather than .defs files) would help with this but not hugely because we’d still need to take care that the API is right for C++. Some initial .hg/ccg creator script would probably be most helpful.

I have no time for the more difficult bugs that turn up occasionally. Luckily Kjell Ahlstedt, José Alburquerque and others arrive regularly in bugzilla with great fixes for hard problems. I trust Kjell and José to commit directly, but they deserve some decent patch review. Unfortunately I often keep them waiting far too long.

I need to make some change. As a start, I might stop receiving bug email for glibmm and gtkmm, and stop worrying about adding new API. I guess it will take at least one missed GNOME release cycle for some other people to take over my responsibilities.

October 25, 2012
Jan Arne Petersen - October 25, 2012 - 18:15

3 days ago Wayland and Weston 1.0 were released. We at Openismus are working on input method support for Wayland (see here and here).

To show the input methods working we have an editor and keyboard example client in Weston. But we also want to demo input methods in real world applications. As a start we implemented an input method module for the EFL (Enlightenment Foundation Libraries) toolkit.

The EFL input method module is available at github.

screen cast

October 20, 2012
Jens Georg - October 20, 2012 - 12:11

Installing this version will REBOOT your device!

Also, if you have installed the previous version from OVI store, you need to uninstall it first before installing this version.

I have release a new version of PushUp. Most notable changes, aside those from syncing with upstream korva, are:

  • Compatibility with devices that need file extensions on the HTTP URLs
  • The possibility to share files which are unknown to Tracker
  • Drop the libffi dependency which caused install issues for some people
  • Next try on using a nice icon in the share dialog
  • Default icons for remote devices
  • An option to cancel transfers using Transfer UI
Please note that due to the slight issues with the N9′s sharing ui, the device has to reboot after installing PushUp. This will happen automatically.
As usual, it’s available here, at apps.formeego.com or in OVI store.

      

October 08, 2012
Jens Georg - October 08, 2012 - 15:04

Update: As Jeremy points out, the update has been synched. All should be well now.

Update: This post only applies to the package on 12.10 and will be fixed when Ubuntu resyncs Rygel from Debian.

It seems that the Rygel package is missing some important files which renders the transcoding non-functional.

A work-around is to download the files from http://git.gnome.org/browse/rygel/tree/data/presets
and drop them into /usr/share/rygel/presets.

September 28, 2012
Murray Cumming - September 28, 2012 - 09:38

I just booked my travel and hotel to visit the Ubuntu Developer Conference in Copenhagen at the end of October, along with Michael Hasselmann and Mathias Hasselmann, all of us representing Openismus.

I think this will be my first UDS since UDS, Mountain View in 2006 and half a day at UDS, Prague in 2008. Lots has changed since then.

September 23, 2012
Jens Georg - September 23, 2012 - 20:18

This XML snippet is from a LG TV:

ackDuration val="àý]Üž"/><CurrentMediaDuration val="àý

Duration is supposed to be a string representation of an integer. Well done, guys m(

September 20, 2012
Murray Cumming - September 20, 2012 - 16:01

Here’s an update since my last status post in June.

Things have improved for Openismus even more though we are not complacent. Several proposals, including the ones I mentioned in that last blog post, have resulted in customer contracts. So we are now busy working on the Maliit input method system (virtual keyboard), Wayland, Rygel UPnP/DLNA and Evolution Data Server (EDS). We are even thinking of hiring another developer if we can find someone who is just right.

We are now established in the habit of creating proposals for customers, revising them, and shifting into implementation. We take our customer proposals seriously, making sure that the developers are the main authors and making sure that they don’t leave questions unanswered. If there’s something we might help your company with then we’d like the chance to convince you too.

Michael Hasselmann is now the main person negotiating new work for us and keeping some of that work on schedule. He has been very successful – a surprise to himself but not to us. We are calling him a Sales Engineer but that doesn’t really do his dedication justice.

Michael can travel much more than I could for the last few years. Right now he’s at the Automotive Linux Summit in the UK and tomorrow he will be at the X Developers Conference in Nürnberg with Jan Arne Petersen (where our interest is mostly in Wayland and input methods). On Monday and Tuesday he’ll visit me in the Munich office.

We have achieved this thanks, of course, to the hard work of our whole team at Openismus. They have fought hard so we can all keep doing worthwhile work that we enjoy. I am proud of them and glad to be part of this.

September 18, 2012
Jens Georg - September 18, 2012 - 16:17

Helium 0.6.0 is available. It mostly contains back-end changes, but  several user-visible changes as well:

Improved seeking


The handling of seeking in the player view has been improved. The area that reacts to the
seeking has been enlarged and the the flicking doesn’t steal the events anymore. A tool-tip will show the current seek position.

Filtering

It is now possible to filter the list of media files. To activate the filter input, just drag down and pull the list at the top as in every other program on the device. By default the filtering will only work on the titles. This can be changed in settings to match against most of the other meta-data as well.

Debugging

It’s now possible to log the UPnP traffic to help debugging of interoperability issues. By default, the log file is written to MyDocs to enable easy transfer of the log files via mass storage mode.

Jan Arne Petersen - September 18, 2012 - 13:47

In my last post about Text Input Method Support in Wayland I wrote about a input methods technical demo which I created for Weston (the Wayland reference compositor). That work got merged into Weston and got improved allot since: The text_model and input_method interfaces got all the basic requests and events, which are required to get a simple virtual keyboard working. Focus handling to track focus via wl_seat was implemented. The example keyboard and editor clients got also improved to better showcase the available features. There is a screencast for demo:

screen cast

Michael Hasselmann - September 18, 2012 - 08:56

Planning my travels for September made me yearn for a travel assistant — it’s impressive how much time one can waste with crappy booking websites and clueless hotel staff. On the other hand, doing it myself allowed me to shuffle things around until I was happy with the plan. Not sure my imagined travel assistant would have had that much patience ;–)

Anyway, tomorrow I’ll be attending the 2nd Automotive Linux Summit (Sep 19-20) and the schedule is packed with interesting talks. Sadly, there won’t be any time for me to visit Warwick Castle again, which is just 15 minutes away from the venue. They say it’s for kids, but the Warwick Castle daily shows are well worth a visit, even for grown-up kids.

The Automotive Linux Summit clashes with the X Developer Conference (Sep 19-21) in Nuremberg, which means I can only attend XDC on Friday (and perhaps the beer hike on Saturday). Now, I haven’t suddenly become a X hacker over night, but I want to know how the progress in Wayland is perceived and whether there’s some good advice for our Wayland input method work. It would have been a great opportunity for me to learn more about EGL and new trends in OpenGL, but I’ll have to leave that to my colleague Jan Arne Petersen, who will attend the whole conference.

After staying in Nuremberg for the weekend, I will spend a few days in Munich (not for the Wiesn) before I head home to Berlin.

Thanks to Openismus for sending me to ALS and XDC!

Permalink | Leave a comment  »

August 28, 2012
Jens Georg - August 28, 2012 - 06:55

We get a lot of reports of UPnP AV clients not working properly with Rygel. It either isn’t seen by the client at all or does not show any content.

The reason is usually the same. People are not following the specs. Rygel seems to be one of the few UPnP-AV/DLNA server out there that implements a higher version of the UPnP-AV specification than :1. A lot of clients are explicitly testing for this version, ignoring higher versions although the UPnP standard states that higher version services need to be backward-compatible. (cf. UDA 1.1, section 1.2.2, last paragraph on page 10). Of course we can work around that – and we do, but the list of exceptions is getting longer and longer and to be honest I’m starting to get really annoyed of those fixes.

I expect that there will be more and more devices with higher versions now that DLNA has added features that require higher versions of the specification than :1. So pretty please get your clients fixed. And if you don’t want to, then don’t make it extra complicated to work around your bug. But really, fix it.

And please have a working support email address so I can complain directly. About every client author I tried to contact has bounced – and the rest ignored me.

July 23, 2012
Tristan Van Berkom - July 23, 2012 - 21:01

Hi everyone,

Long time no blog. I’ve been meaning to blog and build hype around this but as I’ve been busy with so many things it just hasnt come out.

Well the first thing so say is, please be interested to click this link and visit Juan Pablo’s blog. He is speaking first thing on the first day of GUADEC on the topic of embedding GtkBuilder script natively into GtkContainer derived widgets. Some may remember some of my ancient blog posts on the same topic, I never found time to complete the patches in the composite-containers branch but Juan Pablo has picked up the work with a fury and is going to explain in more details in his talk.

In a last minute decision, as the dates are right, I also decided to drop in too. With all the work Juan has already done, a little consensus and participation hopefully we can finally pull off this great feature.

See you there ;-)

 

Krzesimir Nowak - July 23, 2012 - 06:30
I have recently pushed over 30 commits to glibmm gmmproc-refactor branch. This concludes gmmproc's first "milestone" - to generate glibmm sources, build a library, build the tests and examples and finally succesfully run the tests. But by saying "glibmm" I mean only GLib wrappers - Gio wrappers are next on the list.

First milestone?


Yeah, right, as if there are any. That "first milestone" may look modest, but I actually think that it is like 75% of work done. GIR parsing, type conversions system, parsing templates, replacing m4 code with perl and then tying them and squeezing them with foot into some semblance of a program - all of those had to be written to have something working.

I haven't yet made any benchmarks of gmmproc rewrite, but right now it seems to be faster than the old one. Still old one can be faster if we run parallel build (make -jX) - there is one of main differences between currently used gmmproc and its rewrite. The former processes and writes one set of files (like from widget.hg and widget.ccg to widget.h, widget.cc and widget_p.h) on every run, while the former processes all files in single run. That forces me to actually do parallel processing inside my program instead of reusing make's powers. Fortunately, I tried to write most of my code in a way that parallelizing it would be easy if done in OpenMP style.

Talking about speed - there is one area I would like to speed up in general - GIR parsing. It looks like the slowest part of whole gmmproc, but I didn't yet tried to profile it. I was thinking about using typelib instead, but from what I heard, there is no detailed C type information there (the "c:type" attributes from gir). This is still to be confirmed as it is not on top of my nowhere written list. Otherwise, using typelib would be, I imagine, faster. Provided that C-to-perl interface to GIRepository is written first - whooo, yak shaving!

Some plans


Still, I feel no hurry in rewriting as I rather see it to be used in next version of glibmm/gtkmm (I assume that those will be glibmm-4 and gtkmm-4). Thus I am not writing a strict drop-in replacement - I already have made (or I plan to make) some changes in generated API (like exception-to-GError conversion in vfuncs wrappers) and in template API (I would like to get rid of _CUSTOM_CTOR_CAST and similar in favour of _CLASS_FOO options).

There is still much to be done beside threading the application:
1. generating Doxygen documentation based on docs in GIR,
2. code documentation,
3. gmmproc macros documentation,
4. tests (there are some, but still not enough),
5. reports about unwrapped classes and functions,
6. and probably more.

I am happy to get this far and to actually finally see something more or less working.

Small disclaimer


The code is ugly mess. It needs to be reorganized, reshuffled in several places. WrapParser class is huge, even with moving some functions into shared modules. 'use' clauses have to be reviewed. 'constant' module for section traits was probably a crappy idea. Sometimes class/function naming is bogus (identity conversions, imbued types, tokens store?). Code style is very inconsistent (I started reading Modern Perl somewhere in the middle).
Oh, and commit messages are useless - they are probably going to be squashed at some point into single enormous commit for merging into master (and I am going to get only +1 commit of contribution on glibmm on Ohloh, damn!).
July 04, 2012
Jens Georg - July 04, 2012 - 10:33

Following up on Murray’s “Rygel for a DLNA Player” proposal, I’ve made some of the suggested changes listed there which are now in master. These two new libraries have been added

  • librygel-core: This has been a long-standing TODO item. It was necessary to allow in-process use of the DLNA and UPnP knowledge coded in Rygel, allowing the creation of librygel-renderer (see below). On top of this there are other benefits
    • It will allow a Rygel version running on Windows without ugly libtool hacks for the plug-ins.
    • It simplifies the reuse of other parts of Rygel’s code, such as the transcoding HTTP server, in programs like Korva.
  • the new librygel-renderer: In essence this is the playbin plugin with a bit of API on top that simplifies the code necessary to create a renderer. It offers either a playbin element you can use in your code or wraps around an existing playbin.

In future we can extend this family of libraries by “librygel-browse” for remote content access and “librygel-control” for remote control.

To demonstrate librygel-renderer’s capabilities of converting an arbitrary media player based on GStreamer’s GstPlaybin2 into a proper UPnP/DLNA renderer, I have added librygel-renderer support to Totem. You can see the result in the following demo:

The first part of the video (using Sintel) shows how changes in local file playback are being reflected on the UPnP side. In the second part, I set a remote video (Big Buck Bunny) and control totem solely via UPnP, where play, pause, stop, controlling volume, seeking and getting the current playback position is working quite well.

This simple conversion is not a complete DLNA Player. It would need UPnP/DLNA server browsing capabilities for this, as stated in the proposal (In general. Totem can access these servers via its Grilo plug-in).

Of course, with Totem being a complex, mature piece of software, some things don’t work yet:

  • Volume changes in Totem aren’t reflected via UPnP
  • It is not possible to initiate a remote play-back until Totem has an item in its playlist
  • The announced media format support is nowhere near what Totem actually supports

Still, getting an UPnP/DLNA-compatible (and actually close to certifiable) renderer in three lines of code is impressive, don’t you think?

There’s one draw-back that I realized while implementing this. My initial idea to sit on top of a GStreamer playbin2 might be flawed for already existing and mature software such as Totem. There might be much more code-paths dealing with control that happen outside of a playbin. We have an alternative for that as well and that would be implementing one of Rygel’s interfaces by the consuming party. The UPnP and DLNA compatibility that is already in the current code would need to be transferred to this, which is, of course, more work than just attaching to a playbin.

Why should I prefer this above the already available MPRIS Rygel plug-in? There might be several reasons.

  • You don’t want a whole separate Rygel process running just for adding UPnP renderer capabilities to your media player.
  • You aim for some kind of UPnP or DLNA certification – The additional layer of indirection can make that really hard while the presented approach is nearly ready.

So are we going to abandon Rygel’s MPRIS renderer plugin now? No. Because we can’t expect every media player in the world a) use GStreamer and b) want to link against Rygel. MPRIS gives us a quick, ready-made access to a vast amount of players out there and the compatibility it offers is (most of the time) just enough for casual home use.

Jens Georg - July 04, 2012 - 10:32

You might have noticed, there’s a new device update out; the changes relates to Rygel are rather small:

  • It fixes 100% CPU usage when encountered by DLNA clients that issue a massive amount of seek requests (GNOME Bug#662125)
  • Fixes the issue with XBox 360 that showed songs up to 5 times (GNOME Bug#664184)
  • A small memory leak when streaming transcoded audio
June 24, 2012
Jens Georg - June 24, 2012 - 09:43

Following up from ~ three years ago, here’s how it looks like today

June 20, 2012
Jan Arne Petersen - June 20, 2012 - 15:31

Over the last years, we at Openismus have worked on input methods for mobile devices (see for example the Maliit input method framework). During that work we noticed that there are several problems caused by the toolkit-level interface between the input method system and the applications (beside having to implement interfaces for every toolkit, like GtkIMContext and QInputContext). One of this problems is synchronizing events happening to the focus window and the virtual keyboard window. Therefore good platform integration of input methods such as virtual keyboards requires additional window management policies, as the window manager has to gain knowledge about input methods. The compositing window manager on Nokia’s N9 serves as an example for the amount of required integration. It also taught us about what can go wrong and that Xorg induced latency is a hard problem to solve (see Compositing in Maliit). There is also the XIM X extension which integrated input methods on the X level, but its limitations and complexity puts it as a second choice compared to the toolkit interfaces mentioned above (especially since it does not manage to solve there problems).

The move from Xorg to Wayland offers the possibility to move the input method system from the toolkit level down to the display server, as discussed in our Wayland Input Method System Proposal. Last week, Openismus let me work on a small prototype technical demo for Weston (the Wayland reference compositor) which implements some of the ideas in that proposal. This should be useful for testing the idea of having the input method system integrated in Wayland.

New Wayland interfaces

Standalone input method system with text and input method protocols

The prototype defines three interfaces. The first is the text_model, which is used to communicate from the application/toolkit side with the input method system. It provides the input method system with all required information about the active text, such as selection, surrounding text and cursor index. The application receives events over this interface, such as preedit and commit, from the input method system. For the prototype, I implemented an example editor client, which uses the text model to send the surrounding text to the input method system and receives commit events.

The second interface is the input_method interface which is used to communicate between the input method and Wayland. For the prototype we use a really simple keyboard client which just sends commit events when pressing one of the keys.

The third interface is used to register the keyboard surface as an input panel on the shell. I added the input_panel interface to the desktop shell protocol. This allows the shell to stack the keyboard surface into the right layer (for the prototype I used the panel layer). Additionally, the prototype adds support to the shell to show the keyboard only when a text model is active.

Demos

screen cast
The screen cast shows the prototype in action, but for those with a working Wayland installation, it should also be possible to just compile and run the prototype.

Outlook

The prototype is the outcome of less than a week of coding, but it already uses some of the concepts we learned while working on Maliit. It should serve as a baseline for further Wayland input method system integration, even if many basic items are missing. For instance, there is no integration with hardware keyboard events, nor keyboard focus (which is handled by wl_seat in Wayland). Nevertheless I will polish my patches a bit and send them to the Wayland mailing list for integration into Weston, so that they can be used for more work on input methods in Wayland/Weston. My colleague Michael will present this work at aKademy in two weeks, which will be a good opportunity to discuss the possibilities of such an input method system in Wayland.

Mathias Hasselmann - June 20, 2012 - 00:26

Openismus asked me to perform some benchmarks on Evolution Data Server. We wanted to track the progress of recent performance improvements and identify possible improvements. Therefore, I tested these versions of EDS:

The code is in a phonebooks-benchmarks repository on Gitorious with a full auotools build, and with a script to build and test all these versions of EDS. So you can try this too, and you can help to correct or improve the benchmarks. See below for details.

EDS offers various APIs to access the address book. The traditional interface was EBook, which has served us well in GNOME 2 and Maemo 5. However, some APIs such as batch saving are missing in the upstream version. Also its asynchronous API doesn't follow the conventions established later by GIO. To overcome these EBook shortcomings, and to make efficient use of GDBus, the EBookClient API was added in EDS 3.2. We can even use the backend EDataBook API, and that lets us measure the overhead imposed by using D-Bus.

I tested the back-ends with different numbers of contacts. For each benchmark, and for each contact count, we create an entirely new private database. D-Bus communication was moved to a private D-Bus session. To avoid swapping, the maximum virtual memory size was limited to 2 GiB per ulimit command. This limit probably caused the crash of Maemo 5's EDS in the test with 12,800 contacts, but I have not checked that yet.

Contact Saving

These benchmarks simply store a list of parsed contacts in the address book. This simulates use cases such as the initial import of contacts upon migration or synchronization.

To avoid possible side effects from lazy parsing, we retrieve the attribute list for each contact before starting the benchmark. With EBook from Maemo 5 and EBookClient since EDS 3.4, contacts are saved in batches of 3,200 contacts. This partitioning was needed to deal with resource limitations in the file backend. All other EDS variants must save their contacts one by one.

Contact saving, without batching

EBook, EBookClient, EDataBook implementation

As expected, the effort for contact saving grows quickly when not using a batch API. This is because a new database transaction must be created and committed for each contact. Things look much better when using the batch saving API which was available in Maemo 5 already, and was recently added to EBookClient:

Contact saving in batches

Batch saving performance of EDS 3.4+ is just excellent: Although slowly growing with the number of contacts, it remains below a threshold of 3 ms per contact even for 12,800 contacts. That growing effort can be accounted to growing attribute indices. The initial peak (until 50 contacts for Maemo 5, and until 400 contacts for EDS 3.4+) can be accounted to database setup cost.

In terms of performance there is no difference between using EBookClient or EDataBook (which avoids D-Bus).

Contact Fetching

A very basic, but essential, benchmark is fetching all contacts. To get a first impression I just fetched all contacts without any constraints.

Fetch all contacts

EBook, EBookClient, EDataBook implementation

Contact fetching performance decreased significantly during the EDS 3 series and then got better again: Fetching all contacts with 3.4 takes about 133% of the time that EDS 2.32 needs and even 225% of Maemo 5's time. With EDS 3.5 contact loading time is improving again, making the EBook version of EDS 3.5 comparable in performance to EDS 2.32. Git archeology and quick testing identifies Christophe Dumez's bugfix as the relevant change. Apparently the file backend didn't make optimal use of Berkeley DB transactions.

Still there is significant room for improvement, because:

  1. simple contact fetching with EBook 3.5 still takes 175% of the time Maemo 5 needs.
  2. EBookClient 3.5 is still 20% slower than EBook 3.5, and 64% slower than EDataBook.

This basic test shows already that the chosen client-server architecture of EDS causes a non-ignorable overhead.

It would be absolutely worth investigating how Maemo 5 reaches its far superior performance: After all it even beats EDataBook. I remember having spent significant time on avoiding vCard parsing and string copying. I also remember having replaced the rather inefficient GList with GPtrArray at several places. Some of the ideas have been ported upstream during Openismus' work for Intel. Apparently there are more gems to recover.

Fetching by UID

Fetching contacts by UID simulates lazy loading of contact lists: Initially, we only fetch contact IDs. We only fetch the full contacts when the device becomes idle, or when contacts become visible on screen. This approach is needed because even the fastest implementation (Maemo 5) needs several seconds to fetch any contact list of realistical size on mobile hardware. Another useful optimization we implemented on the Nokia N9 is fetching of partial contacts, that only contain relevant information, like for instance the person's name. EDS doesn't support this optimization.

As a first test we fetch contacts one-by-one, without using any kind of batch API:

Fetch by UID without batches

EBook, EBookClient, EDataBook implementation

The good news is that this chart shows quite constant performance for each client.

The bad news is that contact fetching is pretty slow: 3.9 ms per contact, as seen with EDS 3.5, translates roughly to 390 ms to fetch only 100 contacts on this hardware. Considering that typical mobile devices are roughly 10 times slower than my notebook, these numbers are disappointing. Especially if you consider that EDS 2.32 was about 4 times, and Maemo 5 even about 13 times faster. This are entirely different worlds. It should be investigated what causes this significant performance regression from EDS 2.32 to EDS 3.2+. One also should try to port the performance fixes of Maemo 5.

The performance reachable under ideal circumstances is shown by the EDataBook client. This only needs about 50 µs (0.05 ms) to fetch one contact by its id. Directly accessing the address book via EDataBook is about two orders of magnitude faster than the current EBookClient. That's the goal that EDS can, and should, aim for. Apparently a significant amount of time is spent on performing D-Bus communication, whereas the requested task can be performed by the backend within almost no time.

However, this data was acquired by fetching the contacts one by one. We can surely do better by using a batch API. That should drastically reduce the overhead caused, for instance, by D-Bus. But neither EBook or EBookClient provide an API to fetch contacts by lists of contact IDs. The thing that comes closest is filtering the contacts by a disjunction of UID field tests:

(or (is "id" "uid1") (is "id" "id2") ...)

So I tried that. The results for such queries, using batches of UIDs, look like this:

Fetch by UID in huge batches

EBook, EBookClient, EDataBook implementation

This chart speaks for itself. To remain responsive and appear fluid while scrolling, applications should render at 60 frames per second. To reach that framerate newly visible contacts must be fetched and rendered within less than 16 ms. EDS apparently cannot meet that performance goal even on desktop computers. Considering the huge performance differences between client-server access and direct access, as seen when fetching contacts one by one, it seems very worthwhile to add dedicated support for fetching multiple contacts by UID. The most obvious approach would be adding a batch API in the spirit of e_book_client_add_contacts(). Another solution would be adding more fast paths to the query processing code.

Filtering

EBook, EBookClient, EDataBook implementation, the queries used

Contact filtering is relatively efficient when using fields such as last-name, for which indices are hit. Still, the D-Bus overhead is noticeable: EDataBook needs less than 60% of EBook's or EBookClient's time.

The times to match long prefixes and suffixes look quite similar when hitting indices.

The behavior of EBook for short name prefixes is a bit strange. The EBook API is now deprecated, but it could still be worthwhile to identify the issue causing this strange behavior, so that it can be avoided in the future:

Interestingly, there seem to be no functional database indices for email addresses or phone numbers in more recent versions of EDS:

The behavior of Maemo 5's EDS is a bit surprising, as I know that Rob spent significant amounts of time on adding Berkeley DB based indices to that EDS version.

It might be worth optimizing index usage in EDS, because prefix and suffix searches are commonly used in mobile applications. Prefix searches need to be fast, for quick feedback during auto completion. Suffix searches need to be fast, for instance to provide instant caller identification.

Memory Usage

Memory is cheap those days. Still, especially on embedded devices, you should keep a close eye on the memory consumption of your software. Obviously, memory consumption grows with the number of contacts used:

Resident Set Size (RSS)

It's nice to see how memory consumption has reduced from release to release. It's also good to see that EBookClient seems to use slightly less memory than EBook.

You might miss the graphs for Maemo 5 and EDS 2.32. I had to drop them for this chart as they show serious memory leaks, preventing any useful examination. Apparently the leak really is in those versions of EDS: The EBook benchmarks for EDS 3.2+ are using exactly the same code but don't show this bad behavior.

Notice that I've accumulated the client's and the backend's memory consumption. This allows us to estimate the cost of EDS's client-server architecture. Apparently this architecture increases memory consumption by about 40% in these benchmarks.

While the RSS usage gives us information about the efficiency and correctness of the code, it's also interesting to check the virtual memory size needed by the benchmarks. Please take the following numbers with a reasonable grain of salt: I got these numbers by simply adding together the virtual memory size of the client and of the backend process, as reported by the processes' status file. A proper measurement would process the maps file to properly account for shared memory regions.

Virtual Memory Size (VMS)

The first issue we notice is the massively increased memory usage of EBookClient 3.2. It's almost 40% more than the others. Fortunately, the issue seems to have been fixed already in EDS 3.4.

At first glance, the very low virtual memory usage of the EDataBook benchmark is impressive. It seems to consume only 40% of the client-server based benchmarks. Still, there is a fairly high chance that this huge delta must be attributed to my poor measurement here: Assuming perfect code segment sharing there only remains a delta of about 20 MiB, which would be nothing but the RSS delta of EDataBook and EBookClient. It would be nice to investigate this in more detail.

RSS usage per contact

This chart shows the memory per contact after the contact saving benchmark. The overall memory usage per contact has grown dramatically by almost 40% in EDS 3+. The most efficient approach is apparently to directly access EDataBook, which consumes only 55% of the RSS per contact, compared to the client-server approaches.

RSS usage per contact, model only

This high memory usage per contact is a bit surprising since, after subtracting effects from library and database initialization, the memory usage per contact remained constant between EDS 2.32 and EDS 3.5. The parallel usage of both Berkeley DB and SQLite in the file backend might be to blame, but this is currently pure speculation from me.

The temporary regression in EDS 3.2 was apparently fixed. The increased memory usage of EBookClient and EDataBook over EBook is because the EBookClient and EDataBook benchmarks, in a slightly unrealistic bias for performance, store both the EContact and the VCard string for each contact.

Conclusions

The developers of Evolution Data Server have paid good attention to performance and have successfully implemented significant improvements. However, EDS releases regularly suffer performance regressions, and the performance of EDS still isn't good enough for usage in mobile devices. Fortunately the performance problems are solvable. Some fixes will be straightforward, such as adding more batch API (or fast paths) for query processing. Others will need careful performance monitoring: For instance when activating more database indices, to speed up queries, we must be careful not to slow down contact saving.

A not so trivial improvement would be adding a direct access API for reading the local database. The speed and memory usage measurements show the value of such API: Direct access is significantly faster than via D-Bus in most usage cases, and it seems to significantly reduce memory usage.

Another significant improvement should be finishing the file backend's transition to SQLite: Using two database backends in parallel significantly increases code complexity and has measurably bad impact on memory consumption.

Usage Instructions

The full source code of this project is in our phonebooks-benchmarks repository on Gitorious. You'll need a fairly recent C++ compiler because I also used this project to get more familiar with the new features of C++11. I've successfully tested g++ 4.6.3 and g++ 4.7.0. Clang 3.0 definitely won't work because of incomplete lambda support.

Other than that, you'll need Boost 1.48 to compile the benchmarks. The optional chart-drawing module uses ImageMagick++ and MathGL 2.0.2.

There is a simple script for building and installing the tested EDS versions. The configure script will give instructions.

To finally run the benchmarks just call src/phonebook-benchmarks, and to draw the charts run src/phonebook-benchmarks-mkcharts.

When doing your own tests that needs a non-trivial vCard generator take look at src/phonebook-benchmarks-mkvcards.

Outlook

It would be interesting to take a more detailed look at the virtual memory usage.

Also it would be educational to compare these results with other address book implementations. The first candidates I have in mind are QtContacts on Tracker and Tizen's native contacts API.

We didn't cover EBookView and EBookClientView yet. These views take a query and notify the application when the contact set matching the query has changed. Typically, every user interface needs to use them.

We also didn't talk about the calendar API yet.

Well, and most importantly we at Openismus would enjoy fixing the identified performance problems.

June 19, 2012
Michael Hasselmann - June 19, 2012 - 06:21

Maliit-gnome-vkb-comparison
For the Maliit Keyboard, we need the ability to style it for different form factors. As our focus is mostly on mobile devices (where the form factor is known), we got away with pixel-based styling that — while being resolution dependent — makes it easy to place and size the buttons in exactly the way needed. It’s what designers call pixel perfect, and the difference can be very noticable. Let’s take a look at the comparing screenshot, with the Gnome3 virtual keyboard on top (taken from the 3.2 release notes) and the Gnome3-styled Maliit Keyboard at the bottom (video). The former uses a generic layout engine and generic styling, whereas the latter allows to style pixel-perfect layouts.

For the official Gnome3 virtual keyboard, one can notice how the space bar is strangely sized, how the regular keys are unpleasantly square and how the enter and backspace keys are way too small. They all use the same graphical asset. For the Gnome3 styled Maliit Keyboad, the letters d-j, x-n, together with the space bar and its neighboring keys, form a neatly aligned center area, with a staircase pattern to each side of the keyboard. If you look in the last row, you will notice that the space bar is just a few pixels too large. This happens because the only non-pixel-perfect elements in the Maliit Keyboard are stretched keys and spacer elements, and I somehow still compute their width incorrectly. From a designer perspective, that “pixel bug” in the layout destroys the immersion of an otherwise good design. Pixel perfectness — whilst having a poor reputation among us developers perhaps — is actually quite a challenge to get right!

Maliit Keyboard is a reimplementation of the MeeGo Keyboard that was shipped on the Nokia N9. It doesn’t share any code with it, but it does reuse the ideas and the XML layout files. The format for the layout files evolved from programmer-friendly to designer-friendly over time, which I think was a good change. Instead of using the complex and pretty generic styling system of libmeegotouch (which was intended to fit the needs of any mobile application), I use a concrete style class that reads a styling profile folder, looks into into the two INI files (one for the main keyboard area, one for the extended keys) and fills a model with style attributes. In the given context of virtual keyboard styling, the concrete solution offers the same flexibility as the generic solution but comes with a fraction of the implementation complexity and remains simple to extend. This is because the concrete solution fits the intended use-case perfectly, whereas a generic solution will always acrue overhead.

The trade-off, of course, is that for a new use-case, I’d probably have to write a new style class and create new style attributes. It is very unlikely, however, that the styling requirements of a fairly specific “application” would fit the the styling requirements of a regular application, so why burden myself with a generic enough abstraction that I will never use? Inversely, why try to make a generic solution fit a specialised problem? Jokingly, that’s precisely what I call Java programming: Instead of solving the problem at hand, write a framework that can solve all similar problems. I have started to grow a strong aversion against that, not because I like reinventing the wheel (I don’t), but because I think that understanding the concepts behind an idea is far more valuable than clinching to an implementation of it.

For the Maliit Keyboard, I took the concepts of the libmeegotouch styling system and mixed my understanding of CSS with the XML layout files. The result feels pretty clean and easy to use, but that’s because I based the implementation on well-understood concepts, not because of the implementation itself. Who knows, if I keep myself busy with implementing virtual keyboards for just a bit longer, I should eventually be able to write a pretty decent book about VKB design patterns ;–)

Permalink | Leave a comment  »

June 06, 2012
Krzesimir Nowak - June 06, 2012 - 13:36
These are bit dated news (well, only two weeks or so), but the fork/exec rewrite of DBus server was merged into master branch. The main aim of rewrite was to allow several concurrent synchronisations to be done (provided that the sync sessions don't conflict with each other). It wasn't possible before, because libsynthesis (the main library used for sync) does not have asynchronous API. There were two solutions: either use threads or spawn children processes doing syncs. The latter solution was chosen, because Patrick Ohly did not want threads to avoid complications with some possibly thread-unaware libraries. (Of course waiting for libsynthesis to get asynchronous API wasn't an option.)

The rewrite involved splitting the code in one file of syncevo-dbus-server into several files and then detangling all tightly coupled classes - I guess that the result is quite nice compared to what it used to be. In the meantime new C++ DBus layer based on GLib GDBus was added. Most of this work was done by Chris Kühl - I was mostly helping in fixing issues in new DBus layer and then working full-time on it and the code using it. As a final step I ported command line test from C++ to Python, so our work could be proved to work.

This was a really long task, very often I felt that I was losing grasp on the code I was working on and probably sometimes I had actually lost it. Now I feel happy to see it finally merged into master.
June 04, 2012
Jens Georg - June 04, 2012 - 23:31

Helium 0.5.0 is available. It contains a lot of improvements and some new features such as

  • Volume control on the player
  • Possibility to play a media file on a different renderer than the currently selected
  • Option to start the on-device media sharing
  • Some more settings

Get it on this site or via apps.formeego.com. I’ve also updated the documentation to reflect the changes.

May 17, 2012
Michael Hasselmann - May 17, 2012 - 09:02

With only roughly over five weeks to go, QtCS 2012 is approaching fast on the event horizon (pun absolutely intended). During the TiZen Developer Conference, Quim Gil and I, with help from Robin Burchell, found time to work on a draft, the result being visible in the QtDev wiki. Don’t be scared by the amount of pre-scheduled sessions, we will try to keep it as unconference-y as possible. However this allows us to properly prepare selected sessions.

Please keep in mind that QtCS 2012 is invite-only.

Session presenters, prepare!

If you are a maintainer of a Qt Essentials module, then you will have to drive one of the pre-scheduled sessions.

It might be a good idea to start writing a report about what happened in your area over the last couple of months, and where you plan to take your module for the Qt 5.1 release. Attendees of your session will probably expect you to talk about Qt 5 migration and how it will affect code using your module. On top of that, the most urgent bugs might get discussed, along with new feature requests, so please be prepared.

If you find a topic that isn’t showing up in the program yet, consider to get in touch with other interested folks (the Qt development mailing list or #qt IRC channel are good places to ask). With sufficient feedback, you should have everything it needs to drive the session yourself!

Plenary sessions

The first day will start with the State of the Union session about Qt 5 and Qt 5.1, summarizing where we are and where do we want to go with the Qt project.

The second day is about betting on Qt Quick, covering topics such as theming, platform integration and cross-platform. We want to invite the maintainers and a few vendors/users/contributors to expose their thoughts and needs with the hope of bringing a common understanding.

For the last day, we have a special surprise, as the topic will be HTML5 & the web. I am aware that this will raise a lot of controversy among the community, but it is an area where we need to find answers. With all the latest interest in web technologies, the answer can no longer be “We don’t do web”. That’s what killed the dinosaurs, remember? Then again, Qt already has better answers then that, so it’s about time for a fierce but honest panel discussion, including the Qt WebKit maintainers and other users/contributors driving this area.

Hack'n'Tell

Of course we also want to you to feel entertained during the summit, which is why we want to try out Hack'n'Tell sessions.

The rules are simple: Show us a cool hack involving Qt. You got 5 minutes. No slides.

I realize that preparing cool hacks takes some time, which is the original intent for this blog post: to announce the Call for Cute Hacks. For the first day at least, it would be great to have a couple of presenters before the summit starts. For the other days, I kind of hope that the summit itself will spark great ideas and demos that need to be shown immediately.

I hope I got you excited a bit, so let’s make QtCS 2012 an event to remember!

Permalink | Leave a comment  »

May 16, 2012
Michael Hasselmann - May 16, 2012 - 11:51

In anticipation of two busy conferences in the San Francisco Bay Area, the TiZen Developer Conference 2012 and the Ubuntu Developer Summit for the Quetzal release, I prepared myself by arriving one week earlier. This avoids the stress of a long distance flight immediately before conferences start, but also helps to adjust yourself to the environment (e.g., get used to public transport, find your ways around the area, get used to the food, etc.).

My flight to California was somewhat jinxed. Instead of getting a direct flight to SFO from an European airport, I had to change over in Miami. So instead of 12 flight hours, I was already up to 17 hours. Then the flight got delayed, and I of course missed the connection flight. Being rescheduled to the next flight meant another 5 hours of extra waiting time and arriving at SFO briefly before midnight. Which means no public transport for you. Luckily, I had informed Quim Gil about my additional delays (thanks to free WiFi at Miami airport), and being the good friend that he is, he decided to pick me up from SFO.

On the next day (a warm Sunday), I had time to visit the Computer History Museam in Mountain View. Later that evening, Quim, Henri and I met for a few drinks, the outcome being this Maemo thread. Sadly, the thread itself has seemingly no positive outcome yet.

On Monday, it was time to go to San Francisco, where I spent the next days (mostly) working and preparing my TiZen session for next week. It was great to experience San Francisco from a non-tourist point of view (though I still had to visit Johnny Rockets together with Alberto, Łukasz, Kat and Dave, the milkshakes really are that good). My highlight of the week was perhaps the lunch with the ever friendly Yorba guys on Friday. I saw their awesome new T-shirts, but sadly they’re for employees-only.

On the weekend, I met with Lokesh and we drove down to Santa Cruz (great beach, lots of surfers, too) and from there to Monterey, on the famous 17-Mile Drive. I even dared to (very briefly!) swim in the Pacific Ocean, another first for me. The Pacific made me pay my courage in blood though (nothing too bad, just an annoying cut at my heel).

Then, well rested and all that, it was time for the conferences. Since there was only one keynote for Monday evening at the TiZen Developer Conference, I spent most of the day at UDS. The one thing to notice was probably that UDS had way more developers than TiZen, and to my surprise, even some high profile GNOME developers attended UDS. So much for all the silly fights about Ubuntu not being GNOME etc. I also met Daniel d'Andrader again, which was nice.

For the other two days, it was always a bit of back-and-forth between UDS and TiZen. UDS had the better food (breakfast, hm!), but TiZen probably had the better parties (California Academy of Science comes to mind). On Thursday I started to feel exhausted and on Friday, I was glad UDS was to be over soon. The beach party (without a beach) totally killed it.

The flight back was uneventful. After two weeks in California, with the second one being extremely stressful, I was glad to finally land in Berlin again.

Permalink | Leave a comment  »

Michael Hasselmann - May 16, 2012 - 07:43

The first day started slowly, with only one keynote in the evening. I wasn’t very happy with it, a proper keynote would have focused more on the visions and goals of TiZen, not so much on Linux or Android, or Microsoft.

On the second day, my mind was mostly occupied with my own session, “Challenges of mobile text input”, in the afternoon. It was good that speakers were informed early enough about the whole process, and the presentation templates were available long enough before the event. Even with the cold color scheme, I kind of like the clean TiZen design in the templates (though I still can’t get used to the name “Ti-Zen”).

Two hours or so before my session, I had one technician help me upload my slides to the room’s PC. It’s great to not have to use your own laptop. Instead of worrying until the last minute whether everything will work, you know it just will, and if it doesn’t, it is not your task to fix it. Not having this typical presenter’s crisis helps to focus on the content of your session and also helps to stay calm.

I was quite happy with my presentation, even though I was hoping for more input method developers in the audience. You can find the slides on the Maliit wiki, video (or at least audio) should be available at some later point, too.

On the last day, every attendee could get a TiZen developer device. I was surprised it didn’t have “Intel Inside”, but the ARM Cortex9 with the Mali GPU is interesting hardware nevertheless. Perhaps not too surprising for a HTML5 platform, it also comes with a speedy and actually useful mobile browser from the start. Even if it is uncertain at this point whether TiZen will end being a success, I am looking forward to see actual TiZen consumer devices.

Thanks to the Linux Foundation for sponsoring me, and thanks to Brian Warner again for helping with the organising bits.

Permalink | Leave a comment  »

May 03, 2012
Michael Hasselmann - May 03, 2012 - 22:58

Motivated by the new Blackberry 10 virtual keyboard, I decided to spent a couple of hours on a proof of concept, this video being the result.

I had to hack the Presage engine a bit to provide word prediction in a similar fashion to what you see in the 2-3 seconds of the Blackberry video. Then I added some space between the rows of the keyboard, so that I could place additional word ribbons there. The word candidates appear next to their starting letters, though it’s only one candidate per letter. I need to find a better solution here, but then again Blackberry guys also haven’t solved it either ;–) Tapping on the word candidates inserts then into the text editor (no gestures, for now).

The code is very hackish, certainly nothing I would publish. I am going to put it onto a tablet so that I can show it around to you guys at the TiZen Developer Conference or the Ubuntu Developer Summit.

Jon is going to bring a camcorder on Sunday, so perhaps we can actually record a real, youtube-worthy video then.

Permalink | Leave a comment  »

Michael Hasselmann - May 03, 2012 - 18:01

I am going to speak about mobile text input at the TiZen Developer Conference (7-9 May, San Francisco, CA). My session is scheduled for Tuesday afternoon (PDF).

I had planned to only focus on the technical aspects, but I am also going to talk about other aspects, such as the media coverage a cleverly designed text input method can achieve, and which features are apparently important to consumers.

The most important feature is, unsurprisingly perhaps, the overall performance: How fast can consumers insert or manipulate text on their mobile devices — while still being accurate? I will explain techniques that can help to improve responsiveness, accuracy and speed of a virtual keyboard.

I’ll be in the bay area until end of next week and I am generally interested to discuss the finer details of (text) input methods, accessibility or display servers and how one could improve the situation for accessibility on Linux. Contact me on Twitter in case you want to meet up and go somewhere for dinner (or drinks) in the beautiful city of San Francisco!

Permalink | Leave a comment  »

April 26, 2012
Jens Georg - April 26, 2012 - 19:24

If you just upgraded to Ubuntu 12.04 and your PS3 or Sony TV stopped showing videos, the work-around is to uninstall gstreamer0.10-plugins-bad-multiverse. See bug 672439 for the details.

The bright side is that for the first time ever, we have a recent Ubuntu shipping the latest version of Rygel.

April 25, 2012
Mathias Hasselmann - April 25, 2012 - 07:48

It's a common design to use full text search engines only for free text searches, but to store the actual structured data in a separate database. Such designs come at a cost. Therefore Openismus asked me to build upon my previous post, where I analyzed several FTS engines. This time I'll research if we could use the full text search index itself as our primary data store.

Relations

A first obvious limitation is the lack of joins. So to use the FTS index as data store, you must denormalize your data. That is, instead of storing your movie database in distinct entity tables like Movie and Artist, linked by relationship tables like isLeadActor or isDirector, you must find a way to put everything into one single flat table. This isn't entirely nice in terms of redundancy and consistency. On the other hand joining tables is what makes relational databases slow and hinders distributing them across servers. Is there someone whispering "NoSQL"? Well. Yes, while I absolutely dislike their striking marketing: They are on to something, and with our journey today we enter their land.

Seems I've lost myself in chatting, so back on topic. So to store data in a FTS index we must denormalize our data. Luckily they make it easier than it sounds. In opposition to the relational model, there is no need to create complex relationships, just to assign more than only one actor or director to a movie: When adding artists to your movie you just tag each name with the proper field prefix before adding it to the index, and you are done. FTS engines natively support multi-value fields!

With some additional effort it also should be possible to store more structured data in those multi-value fields, things like (release-date, country), or (actor, role): You'd add more prefixes and use the positional information stored for phrase searches to reliably identify those fields. Sadly my time is too limited to research this more in detail, but the Internet surely has documents about this. Well, or for additional fun you can try to figure it out yourself.

Exact Matches

Now a match reported by an FTS engine only tells us that the document or the chosen field contains the phrase we were looking for. When searching for `title:"The Matrix"` any FTS engine will not only return the first movie of the Wachowskis' triology, it also will give matches for the other two movies, and works like *"Making 'The Matrix'"*. So for doing exact lookups we'll have to filter the initial result, and drop any document that doesn't exactly match our requirements. Sadly we really must check the field value instead of just checking the computed score: For instance with Lucene both *"The Matrix"* and *"Making 'The Matrix'"* will get a score of 100%, since both documents fully satisfy all terms of the query. Also we cannot use the score as indicator to only check fields for documents that got at least 100%: When searching for `director:"Quentin Tarantino"` the movie *"Inglourious Basterds"* will get less than 100%, since Tarantino was working with Eli Roth for this movie. So this additional filtering sounds expensive at the first moment, but remember that our index lookup dramatically reduced the data set already. When looking for `title:"The Matrix"` in the *imdb-50* data set, we talk about checking 9 documents instead of 121,587 documents for example. For useful data sets we won't notice the overhead, like the test results below are showing.

You can just add unanalyzed fields and use term queries on them like kamstrup pointed out.

Data Types

So we've learned that lack of relations isn't much of a problem for many useful datasets, but structured data is not only about relationships, it also is about data types. Full Text Search engines only support lexicographical order, so they surely fail for dates and numbers. You surely cannot use them to find documents within a given range!

I am sorry to disappoint you. The people researching FTS are smarter than that. Actually properly sorting and ranging dates, while only using lexicographic order is trivial. Most probably you have done it yourself already. Simply store your dates in ISO format, that is YYYY-MM-DDThh:mm:ss.SSSNNN or any prefix of this, and you are done. Omit the separators if you prefer. ISO-8601 explicitly is designed for lexicographic sorting.

So how do you do this with numbers? You could prefix them, for instance with zeros, to get a fixed width. This works reasonably if you know your number ranges, and in most cases you do. Sometimes you know the range from your application's context, e.g. the first known celluloid film was recorded in 1888. More easily you just use your technical limits, like [-263..263-1] for long integers. While first experiments really followed that approach, padding numbers with up to 18 zeros isn't exactly efficient or pretty. Also we didn't talk about floating point numbers yet. Therefore FTS engines like Lucene or Xapian provide more efficient mechanisms for turning numbers into sortable strings. First they write a prefix indicating number precision (64 bit, 32 bit, 10 bit, ...). Then they convert the numbers to some unsigned format, and apply some kind of base-128 encoding to the resulting bytes. The most significant bit gets stored first. For floating point numbers they shuffle some bits of the number's IEEE-754 representation. The resulting, sortable 64 bit integer then is encoded like any other number. You can consult Lucene's documentation, and the source code of Lucene::NumericUtils, or Xapian::sortable_serialise for details.

Benchmarks

Hope I didn't lose you with all this theory, now it is benchmark time!

To test how useful FTS engines are for storing arbitrary data I've extended my previous benchmark to better support range searches, and to support exact matching of fields. I've also added Michal Hruby's patch for supporting prefix searches. Since the prefix search gives countless hits, the query results consistently are limited to 10.000 rows now. I've dropped QtCLucene for now since it doesn't seem to support numeric range searches and such. It was forked from Java Lucene a long time ago. For SQLite I ran two sets of tests: bm_sqlite doesn't create indices for fields like movie title or artist names. Since such setup is unfair when comparing with FTS engines, the second set bm_sqlite_index creates indices for all fields we perform lookups for. For tracker we again test the Nepomok media ontology (bm_tracker) and a optimized ontology (bm_tracker_flat), that attaches all properties to the same RDF class. I had to disable prefix searches for bm_tracker: The query ran for more than 2 hours on the dataset with 17k movies. I seriously wish I'd get sponsored to improve Tracker's data model!

Source code still is in the fts-benchmark repository, tagged as release/0.3.

Results and Discussion

Each query got run 7 times on 5 different data sets. This time I didn't take the mean of the query execution times. The individual results of each dataset are grouped together and labeled with qxx_t1 to qxx_t7. Data and result sets grow with each group.

Also be careful when reading the charts as time is scaled logarithmically. You might want to consult the raw data tables below for details. Please keep in mind that the basic goal of this benchmarks is to test scalability, not raw performance. Therefore I don't mind much if an engine is 10 times slower than another for small data sets. Constant performance is the ideal result.

You'll also notice that some charts have gaps for bm_tracker. Like explained above I had to skip bm_tracker for few data sets, as tracker took way to long to perform those benchmarks.

rating:[90 TO 99]

Lucene++ appears significantly slower than its competition for small data sets, but then gives comparable results for data sets with more than 3,000 movies. Still I would not overrate this finding: We are talking about lookup times in the range of 10 ms. That's still pretty fast and close to measurement limits like the spikes in the other engine's results show.

release:[1999/01/01 TO 1999/09/30]

This results are similar to the rating:[90 TO 99] query.

release=1999/03/31

For this query you see the importance of having an index for your lookup keys: Performance of bm_lucene++ and bm_sqlite_index remains almost constant, while effort of the other engines grows dramatically as the data size grows.

Xapian's bad performance comes as a surprise, but actually I am to blame here: For stupid reasons I've implemented this very search as range search in Lucene++ and Xapian (release:[1999/03/31 TO 1999/03/31]). As the results indicate Lucene++ seems to putting more effort into optimizing range searches, and compensates my mistake.

title=The Matrix

Similar results as for release=1999/03/31, only that Xapian behaves as expected now. When given a proper query it also shows constant lookup time for exact phrase searches.

director=Quentin Tarantino

With this query you see the advantage you get from using denormalized tables: Lucene++ and Xapian just are as efficient as in the previous tests, but as a not so big surprise Tracker with the flat ontology now beats all remaining engines, including bm_sqlite_index.

T*

Performance of the different engines is similar to each other when performing prefix searches.

Raw Result Data

rating:[90 TO 99] - 9 movies, 3 matches
t1t2t3t4t5t6t7
bm_lucene++12.333 ms10.409 ms9.885 ms9.821 ms10.221 ms9.840 ms9.986 ms
bm_sqlite0.196 ms0.169 ms0.169 ms0.173 ms0.166 ms0.167 ms0.167 ms
bm_sqlite_index0.207 ms0.183 ms0.172 ms0.192 ms0.193 ms0.173 ms0.172 ms
bm_tracker0.992 ms0.655 ms0.582 ms0.589 ms0.554 ms0.549 ms0.525 ms
bm_tracker_flat0.693 ms0.463 ms0.437 ms0.461 ms0.450 ms0.443 ms0.436 ms
bm_xapian0.242 ms0.201 ms0.200 ms0.198 ms0.200 ms0.199 ms0.197 ms
rating:[90 TO 99] - 1,099 movies, 17 matches
t1t2t3t4t5t6t7
bm_lucene++12.949 ms13.057 ms12.981 ms13.018 ms13.150 ms12.840 ms12.644 ms
bm_sqlite0.696 ms0.546 ms0.516 ms0.530 ms0.515 ms0.518 ms0.522 ms
bm_sqlite_index0.448 ms0.234 ms0.231 ms0.237 ms0.236 ms0.231 ms0.231 ms
bm_tracker5.051 ms4.485 ms4.441 ms4.486 ms4.425 ms4.831 ms4.828 ms
bm_tracker_flat1.465 ms1.133 ms1.110 ms1.104 ms1.108 ms1.108 ms1.108 ms
bm_xapian1.445 ms1.285 ms1.159 ms7.824 ms1.878 ms1.669 ms1.393 ms
rating:[90 TO 99] - 3,216 movies, 35 matches
t1t2t3t4t5t6t7
bm_lucene++14.287 ms13.596 ms13.453 ms13.912 ms13.875 ms14.559 ms13.981 ms
bm_sqlite3.524 ms4.110 ms4.129 ms1.916 ms1.732 ms2.300 ms9.584 ms
bm_sqlite_index0.423 ms2.036 ms4.617 ms4.577 ms0.388 ms1.957 ms7.981 ms
bm_tracker12.776 ms11.816 ms12.449 ms11.755 ms11.762 ms11.983 ms11.764 ms
bm_tracker_flat2.935 ms2.517 ms2.374 ms2.264 ms2.250 ms2.261 ms2.258 ms
bm_xapian9.292 ms2.702 ms10.573 ms6.773 ms3.098 ms11.438 ms3.035 ms
rating:[90 TO 99] - 17,251 movies, 260 matches
t1t2t3t4t5t6t7
bm_lucene++58.996 ms56.894 ms62.172 ms57.028 ms57.255 ms57.540 ms57.259 ms
bm_sqlite36.682 ms28.260 ms34.116 ms34.786 ms35.195 ms35.813 ms35.221 ms
bm_sqlite_index45.802 ms62.460 ms31.603 ms32.982 ms33.302 ms31.904 ms31.656 ms
bm_tracker67.022 ms64.609 ms64.649 ms65.243 ms64.183 ms64.887 ms64.283 ms
bm_tracker_flat14.730 ms14.179 ms14.132 ms14.221 ms14.248 ms20.225 ms35.888 ms
bm_xapian94.872 ms47.067 ms85.202 ms28.575 ms142.854 ms48.562 ms52.567 ms
rating:[90 TO 99] - 121,587 movies, 1,510 matches
t1t2t3t4t5t6t7
bm_lucene++283.122 ms392.801 ms382.164 ms403.929 ms384.512 ms408.292 ms361.548 ms
bm_sqlite293.488 ms236.636 ms249.677 ms232.674 ms270.198 ms282.806 ms218.726 ms
bm_sqlite_index231.638 ms311.523 ms198.781 ms279.063 ms219.294 ms192.589 ms276.822 ms
bm_tracker-------
bm_tracker_flat181.478 ms272.453 ms251.730 ms256.744 ms293.067 ms230.615 ms245.113 ms
bm_xapian376.176 ms417.637 ms411.263 ms366.596 ms393.168 ms372.888 ms412.411 ms
release:[1999/01/01 TO 1999/09/30] - 9 movies, 2 matches
t1t2t3t4t5t6t7
bm_lucene++18.768 ms10.167 ms10.799 ms10.215 ms10.443 ms10.917 ms10.210 ms
bm_sqlite0.165 ms0.166 ms0.164 ms0.164 ms0.168 ms0.164 ms0.164 ms
bm_sqlite_index0.175 ms0.175 ms0.170 ms0.169 ms0.169 ms0.169 ms0.170 ms
bm_tracker1.074 ms0.569 ms0.546 ms0.561 ms0.544 ms0.549 ms0.546 ms
bm_tracker_flat0.877 ms0.480 ms0.460 ms0.458 ms0.461 ms0.458 ms0.456 ms
bm_xapian0.183 ms0.175 ms0.175 ms0.178 ms0.178 ms0.180 ms0.175 ms
release:[1999/01/01 TO 1999/09/30] - 1,099 movies, 34 matches
t1t2t3t4t5t6t7
bm_lucene++19.154 ms19.449 ms18.811 ms19.419 ms19.692 ms19.315 ms18.862 ms
bm_sqlite0.691 ms0.686 ms0.684 ms0.687 ms0.690 ms0.702 ms0.698 ms
bm_sqlite_index0.365 ms0.311 ms0.317 ms0.312 ms0.311 ms0.312 ms0.313 ms
bm_tracker6.231 ms5.543 ms5.734 ms5.522 ms5.663 ms5.538 ms5.465 ms
bm_tracker_flat1.998 ms1.494 ms1.466 ms1.469 ms1.470 ms1.454 ms1.469 ms
bm_xapian5.336 ms1.590 ms7.241 ms1.977 ms2.651 ms4.013 ms2.544 ms
release:[1999/01/01 TO 1999/09/30] - 3,216 movies, 84 matches
t1t2t3t4t5t6t7
bm_lucene++32.202 ms31.513 ms31.362 ms30.894 ms31.345 ms31.741 ms31.518 ms
bm_sqlite6.169 ms2.645 ms7.560 ms20.764 ms10.385 ms13.278 ms10.206 ms
bm_sqlite_index19.176 ms4.358 ms12.576 ms15.448 ms15.745 ms5.572 ms5.770 ms
bm_tracker15.507 ms14.803 ms13.629 ms15.465 ms13.930 ms14.515 ms13.652 ms
bm_tracker_flat3.956 ms3.488 ms3.183 ms3.176 ms3.213 ms3.193 ms3.157 ms
bm_xapian18.414 ms5.874 ms11.902 ms12.932 ms19.995 ms21.098 ms13.009 ms
release:[1999/01/01 TO 1999/09/30] - 17,251 movies, 374 matches
t1t2t3t4t5t6t7
bm_lucene++93.892 ms93.900 ms93.549 ms93.555 ms93.924 ms94.396 ms93.795 ms
bm_sqlite37.831 ms44.905 ms47.617 ms45.894 ms43.796 ms45.752 ms47.048 ms
bm_sqlite_index48.475 ms47.805 ms43.046 ms47.393 ms44.689 ms47.842 ms54.208 ms
bm_tracker72.507 ms72.667 ms72.233 ms73.570 ms72.997 ms72.991 ms72.527 ms
bm_tracker_flat29.351 ms48.892 ms55.351 ms49.793 ms88.375 ms55.393 ms45.917 ms
bm_xapian59.522 ms168.591 ms55.750 ms83.424 ms113.679 ms62.803 ms127.895 ms
release:[1999/01/01 TO 1999/09/30] - 121,587 movies, 2,265 matches
t1t2t3t4t5t6t7
bm_lucene++543.495 ms564.582 ms609.045 ms519.248 ms561.844 ms663.549 ms590.518 ms
bm_sqlite165.617 ms387.256 ms293.285 ms335.219 ms324.528 ms324.022 ms371.839 ms
bm_sqlite_index375.504 ms315.671 ms321.115 ms371.228 ms300.951 ms344.073 ms356.366 ms
bm_tracker-------
bm_tracker_flat241.569 ms316.626 ms398.308 ms349.426 ms398.289 ms318.078 ms363.809 ms
bm_xapian529.377 ms556.989 ms577.643 ms576.194 ms626.388 ms545.251 ms570.695 ms
release=1999/03/31 - 9 movies, 1 matches
t1t2t3t4t5t6t7
bm_lucene++10.065 ms10.068 ms9.702 ms9.974 ms9.837 ms9.751 ms10.356 ms
bm_sqlite0.164 ms0.165 ms0.171 ms0.168 ms0.167 ms0.164 ms0.162 ms
bm_sqlite_index0.171 ms0.169 ms0.171 ms0.172 ms0.175 ms0.165 ms0.164 ms
bm_tracker0.659 ms0.476 ms0.473 ms0.469 ms0.464 ms0.468 ms0.468 ms
bm_tracker_flat0.510 ms0.395 ms0.385 ms0.384 ms0.389 ms0.383 ms0.389 ms
bm_xapian0.154 ms0.152 ms0.151 ms0.153 ms0.152 ms0.156 ms0.152 ms
release=1999/03/31 - 1,099 movies, 2 matches
t1t2t3t4t5t6t7
bm_lucene++10.853 ms10.545 ms10.718 ms10.390 ms10.521 ms10.754 ms10.661 ms
bm_sqlite0.515 ms0.528 ms0.505 ms0.512 ms0.502 ms0.507 ms0.505 ms
bm_sqlite_index3.139 ms0.184 ms0.175 ms3.440 ms0.183 ms0.212 ms0.205 ms
bm_tracker4.559 ms4.229 ms4.177 ms4.220 ms4.383 ms4.532 ms4.464 ms
bm_tracker_flat0.977 ms0.830 ms0.800 ms0.808 ms0.802 ms0.811 ms0.802 ms
bm_xapian0.672 ms0.685 ms0.774 ms0.752 ms0.916 ms1.285 ms0.663 ms
release=1999/03/31 - 3,216 movies, 2 matches
t1t2t3t4t5t6t7
bm_lucene++10.799 ms10.762 ms11.399 ms10.676 ms10.704 ms10.169 ms10.325 ms
bm_sqlite1.912 ms1.462 ms1.453 ms1.163 ms1.151 ms1.157 ms4.858 ms
bm_sqlite_index0.366 ms0.350 ms0.355 ms1.883 ms0.364 ms0.345 ms0.371 ms
bm_tracker11.707 ms11.548 ms11.433 ms11.425 ms11.465 ms11.450 ms11.912 ms
bm_tracker_flat1.661 ms1.511 ms1.513 ms1.714 ms1.507 ms1.612 ms1.510 ms
bm_xapian1.278 ms1.364 ms1.359 ms1.821 ms1.994 ms1.429 ms3.192 ms
release=1999/03/31 - 17,251 movies, 3 matches
t1t2t3t4t5t6t7
bm_lucene++12.485 ms12.281 ms12.323 ms11.981 ms12.137 ms11.808 ms12.552 ms
bm_sqlite8.247 ms6.259 ms6.007 ms6.300 ms6.125 ms5.958 ms5.921 ms
bm_sqlite_index0.379 ms0.297 ms0.285 ms0.284 ms0.252 ms0.254 ms0.251 ms
bm_tracker61.537 ms60.815 ms61.014 ms60.821 ms61.013 ms60.850 ms60.820 ms
bm_tracker_flat11.063 ms8.021 ms8.414 ms8.690 ms7.798 ms7.811 ms8.313 ms
bm_xapian5.545 ms4.561 ms4.956 ms4.388 ms4.321 ms4.687 ms4.396 ms
release=1999/03/31 - 121,587 movies, 12 matches
t1t2t3t4t5t6t7
bm_lucene++14.005 ms14.031 ms12.792 ms14.354 ms12.736 ms13.862 ms13.374 ms
bm_sqlite64.517 ms61.783 ms61.669 ms62.418 ms61.377 ms61.326 ms62.036 ms
bm_sqlite_index9.994 ms0.403 ms0.358 ms0.351 ms0.368 ms0.363 ms3.368 ms
bm_tracker-------
bm_tracker_flat62.160 ms62.760 ms56.630 ms60.929 ms54.310 ms53.189 ms58.016 ms
bm_xapian29.180 ms28.239 ms28.080 ms28.054 ms27.777 ms27.615 ms27.505 ms
title=The Matrix - 9 movies, 1 matches
t1t2t3t4t5t6t7
bm_lucene++9.248 ms8.929 ms9.139 ms9.455 ms9.609 ms9.128 ms9.110 ms
bm_sqlite0.163 ms0.163 ms0.163 ms0.161 ms0.160 ms0.163 ms0.164 ms
bm_sqlite_index0.167 ms0.165 ms0.178 ms0.164 ms0.164 ms0.163 ms0.165 ms
bm_tracker0.733 ms0.484 ms0.475 ms0.478 ms0.481 ms0.475 ms0.476 ms
bm_tracker_flat0.575 ms0.400 ms0.380 ms0.382 ms0.379 ms0.387 ms0.379 ms
bm_xapian0.226 ms0.197 ms0.194 ms0.191 ms0.191 ms0.194 ms0.190 ms
title=The Matrix - 1,099 movies, 1 matches
t1t2t3t4t5t6t7
bm_lucene++10.758 ms10.578 ms10.083 ms10.230 ms10.555 ms10.630 ms10.831 ms
bm_sqlite0.728 ms0.524 ms0.504 ms0.501 ms0.506 ms0.500 ms0.501 ms
bm_sqlite_index0.218 ms0.203 ms0.201 ms0.198 ms0.199 ms0.277 ms0.233 ms
bm_tracker5.906 ms5.409 ms5.426 ms5.453 ms5.420 ms5.410 ms5.344 ms
bm_tracker_flat1.685 ms1.471 ms1.455 ms1.455 ms1.440 ms1.448 ms1.439 ms
bm_xapian0.445 ms0.385 ms0.398 ms0.373 ms0.836 ms0.451 ms0.374 ms
title=The Matrix - 3,216 movies, 1 matches
t1t2t3t4t5t6t7
bm_lucene++10.138 ms10.144 ms10.652 ms10.124 ms10.169 ms10.070 ms10.547 ms
bm_sqlite2.587 ms1.180 ms1.198 ms2.202 ms1.411 ms1.422 ms1.288 ms
bm_sqlite_index0.323 ms0.300 ms0.306 ms0.298 ms0.493 ms0.304 ms0.304 ms
bm_tracker15.097 ms14.727 ms14.692 ms14.759 ms14.840 ms14.888 ms14.791 ms
bm_tracker_flat3.727 ms3.529 ms3.558 ms3.545 ms3.504 ms3.504 ms3.520 ms
bm_xapian0.432 ms0.353 ms0.345 ms0.349 ms0.348 ms0.342 ms0.692 ms
title=The Matrix - 17,251 movies, 1 matches
t1t2t3t4t5t6t7
bm_lucene++12.462 ms11.871 ms12.020 ms11.603 ms12.469 ms11.850 ms11.823 ms
bm_sqlite6.093 ms6.096 ms6.130 ms5.941 ms5.882 ms5.959 ms6.789 ms
bm_sqlite_index1.431 ms0.304 ms0.201 ms0.200 ms0.201 ms0.199 ms0.199 ms
bm_tracker79.019 ms78.831 ms78.514 ms78.491 ms79.423 ms78.506 ms78.759 ms
bm_tracker_flat19.173 ms20.160 ms19.373 ms19.043 ms18.992 ms18.961 ms19.207 ms
bm_xapian0.422 ms0.344 ms0.339 ms0.335 ms0.336 ms0.339 ms0.345 ms
title=The Matrix - 121,587 movies, 1 matches
t1t2t3t4t5t6t7
bm_lucene++13.367 ms13.395 ms12.906 ms13.164 ms12.856 ms13.348 ms12.862 ms
bm_sqlite62.625 ms61.341 ms61.296 ms61.361 ms61.248 ms61.195 ms61.607 ms
bm_sqlite_index0.328 ms0.312 ms0.300 ms0.303 ms0.301 ms7.473 ms0.330 ms
bm_tracker-------
bm_tracker_flat138.148 ms131.762 ms130.937 ms131.431 ms131.471 ms130.975 ms130.770 ms
bm_xapian0.833 ms0.681 ms0.674 ms0.687 ms0.665 ms0.667 ms0.665 ms
director=Quentin Tarantino - 9 movies, 1 matches
t1t2t3t4t5t6t7
bm_lucene++9.112 ms9.540 ms9.671 ms9.258 ms9.510 ms9.597 ms9.126 ms
bm_sqlite0.273 ms0.243 ms0.243 ms0.241 ms0.239 ms0.239 ms0.239 ms
bm_sqlite_index0.282 ms0.243 ms0.257 ms0.244 ms0.245 ms0.243 ms0.337 ms
bm_tracker0.810 ms0.547 ms0.542 ms0.544 ms0.541 ms0.554 ms0.567 ms
bm_tracker_flat0.606 ms0.410 ms0.398 ms0.403 ms0.383 ms0.459 ms0.392 ms
bm_xapian0.215 ms0.204 ms0.195 ms0.197 ms0.195 ms0.208 ms0.194 ms
director=Quentin Tarantino - 1,099 movies, 9 matches
t1t2t3t4t5t6t7
bm_lucene++11.574 ms12.063 ms11.780 ms12.169 ms12.253 ms11.801 ms11.939 ms
bm_sqlite13.775 ms8.831 ms9.583 ms9.506 ms9.193 ms9.154 ms9.452 ms
bm_sqlite_index13.332 ms8.963 ms10.201 ms9.064 ms8.925 ms10.095 ms8.756 ms
bm_tracker5.173 ms4.644 ms4.546 ms4.473 ms4.552 ms4.472 ms4.455 ms
bm_tracker_flat1.137 ms0.857 ms0.851 ms0.855 ms0.844 ms0.842 ms0.844 ms
bm_xapian0.898 ms0.878 ms0.893 ms0.873 ms1.000 ms0.882 ms0.843 ms
director=Quentin Tarantino - 3,216 movies, 10 matches
t1t2t3t4t5t6t7
bm_lucene++12.343 ms12.175 ms12.307 ms12.004 ms12.235 ms12.947 ms12.194 ms
bm_sqlite40.967 ms37.867 ms38.607 ms37.618 ms37.487 ms37.124 ms38.147 ms
bm_sqlite_index43.470 ms36.820 ms37.027 ms36.779 ms36.957 ms36.585 ms36.782 ms
bm_tracker13.707 ms13.074 ms12.763 ms12.740 ms12.848 ms12.779 ms12.855 ms
bm_tracker_flat2.015 ms1.559 ms1.531 ms1.525 ms1.530 ms1.545 ms1.511 ms
bm_xapian0.933 ms0.886 ms0.908 ms2.944 ms1.023 ms1.030 ms0.799 ms
director=Quentin Tarantino - 17,251 movies, 13 matches
t1t2t3t4t5t6t7
bm_lucene++13.704 ms14.413 ms14.331 ms15.096 ms14.026 ms14.492 ms14.205 ms
bm_sqlite307.961 ms308.146 ms308.565 ms307.942 ms308.342 ms308.387 ms308.991 ms
bm_sqlite_index308.011 ms305.433 ms305.347 ms304.567 ms304.920 ms305.567 ms304.404 ms
bm_tracker72.690 ms72.075 ms72.005 ms71.999 ms71.938 ms71.946 ms72.108 ms
bm_tracker_flat7.489 ms6.996 ms6.877 ms6.987 ms7.148 ms7.088 ms7.021 ms
bm_xapian1.087 ms0.963 ms1.010 ms1.151 ms1.088 ms0.965 ms0.959 ms
director=Quentin Tarantino - 121,587 movies, 14 matches
t1t2t3t4t5t6t7
bm_lucene++13.546 ms13.955 ms13.981 ms13.854 ms13.740 ms14.114 ms15.816 ms
bm_sqlite4,752.853 ms2,793.690 ms2,800.197 ms2,795.611 ms2,800.578 ms2,794.765 ms2,801.000 ms
bm_sqlite_index2,806.890 ms2,789.648 ms2,788.729 ms2,791.168 ms2,788.102 ms2,790.845 ms2,789.475 ms
bm_tracker-------
bm_tracker_flat47.801 ms46.303 ms46.701 ms46.640 ms46.467 ms46.862 ms46.448 ms
bm_xapian20.098 ms1.260 ms1.176 ms1.162 ms1.156 ms1.149 ms1.148 ms
T* - 9 movies, 9 matches
t1t2t3t4t5t6t7
bm_lucene++17.303 ms17.072 ms16.927 ms16.539 ms16.816 ms16.758 ms16.797 ms
bm_sqlite0.547 ms0.544 ms0.547 ms0.541 ms0.541 ms0.546 ms0.544 ms
bm_sqlite_index0.553 ms0.549 ms0.554 ms0.553 ms0.658 ms0.547 ms0.544 ms
bm_tracker-------
bm_tracker_flat2.525 ms2.302 ms2.423 ms2.415 ms2.372 ms2.356 ms2.305 ms
bm_xapian3.086 ms2.871 ms2.947 ms2.893 ms3.104 ms3.022 ms3.126 ms
T* - 1,099 movies, 1,098 matches
t1t2t3t4t5t6t7
bm_lucene++358.775 ms355.830 ms350.287 ms349.816 ms347.998 ms356.585 ms347.143 ms
bm_sqlite64.679 ms142.927 ms143.776 ms142.847 ms145.319 ms147.244 ms135.600 ms
bm_sqlite_index62.383 ms151.941 ms144.456 ms144.108 ms141.330 ms173.728 ms169.799 ms
bm_tracker-------
bm_tracker_flat199.108 ms213.355 ms202.793 ms196.659 ms194.937 ms194.708 ms195.267 ms
bm_xapian419.323 ms516.929 ms677.357 ms591.280 ms599.091 ms643.124 ms497.649 ms
T* - 3,216 movies, 3,204 matches
t1t2t3t4t5t6t7
bm_lucene++842.413 ms968.828 ms958.367 ms1,002.383 ms932.222 ms946.388 ms1,004.821 ms
bm_sqlite327.669 ms415.921 ms440.198 ms408.543 ms432.575 ms537.572 ms412.061 ms
bm_sqlite_index310.218 ms432.201 ms413.221 ms404.165 ms479.691 ms431.758 ms436.533 ms
bm_tracker-------
bm_tracker_flat727.867 ms711.970 ms722.046 ms717.685 ms719.927 ms713.077 ms713.843 ms
bm_xapian1,442.238 ms1,470.821 ms1,415.183 ms1,392.164 ms1,437.493 ms1,464.149 ms1,520.747 ms
T* - 17,251 movies, ≥ 10,000 matches
t1t2t3t4t5t6t7
bm_lucene++3,006.139 ms3,127.174 ms3,136.617 ms3,151.197 ms3,131.469 ms3,141.155 ms3,056.497 ms
bm_sqlite1,481.321 ms1,388.573 ms1,468.062 ms1,533.263 ms1,422.012 ms1,442.638 ms1,456.166 ms
bm_sqlite_index1,346.717 ms1,451.410 ms1,508.228 ms1,411.643 ms1,460.563 ms1,514.390 ms1,391.342 ms
bm_tracker-------
bm_tracker_flat2,945.536 ms2,938.230 ms2,957.149 ms2,959.569 ms2,972.291 ms2,933.668 ms2,936.655 ms
bm_xapian3,391.825 ms3,490.307 ms3,474.203 ms3,483.310 ms3,560.886 ms3,505.060 ms3,398.937 ms
T* - 121,587 movies, ≥ 10,000 matches
t1t2t3t4t5t6t7
bm_lucene++3,627.408 ms3,625.588 ms3,546.610 ms3,508.233 ms3,599.160 ms4,597.857 ms4,101.686 ms
bm_sqlite2,182.548 ms2,109.730 ms2,109.812 ms2,121.573 ms2,104.320 ms2,117.912 ms2,145.342 ms
bm_sqlite_index2,108.863 ms2,103.648 ms2,131.009 ms2,132.823 ms2,109.655 ms2,137.286 ms2,106.779 ms
bm_tracker-------
bm_tracker_flat8,757.130 ms9,316.640 ms8,708.298 ms8,781.584 ms8,788.042 ms8,699.770 ms8,721.099 ms
bm_xapian4,805.474 ms4,528.004 ms4,692.763 ms4,640.065 ms4,618.215 ms4,647.170 ms4,674.588 ms
April 18, 2012
Mathias Hasselmann - April 18, 2012 - 16:51

Openismus asked me to research how best to index media files and provide full text searching. For the last two years, I have used Tracker for this kind of thing. I like Tracker, but I want to avoid being biased. Therefore, I decided to evaluate alternatives.

Performance is an obvious requirement. We also want to provide a library to permit other applications to access the data we collected. Therefore, SQLite and Lucene (in its C++ incarnations) are obvious contenders. Lucene++ is an emerging project that got suggested by Mikkel Kamstrup Erlandsen at Canonical. QtCLucene is a bit special: So far Qt doesn't provide official support for this library and doesn't install its headers files. Still it is used by Qt's help system, which makes QtCLucene a widely deployed and well tested C++ implementation of Lucene.

Sadly, the big names like MySQL or PostgreSQL do not fit: MySQL's embedded server library is licensed under GPL (instead of LGPL, for instance), which greatly limits legal use cases. PostgreSQL doesn't provide any embedding at all. Because I enjoy RDF and SPARQL I also wondered about testing the Redland RDF libraries, but I found that they don't provide any full text search at all.

Contenders

Test Platform

  • Ubuntu 12.04
  • Intel Core 2 Duo P8400 (2.26GHz), 4 GiB RAM
  • HDD: WDC WD2500BEVT-2, encrypted (aes-cbc-essiv:sha256)

Test Scenario

To get somewhat realistic data I've fetched a copy of the Internet Movie Database from ftp.fu-berlin.de. Since it is a quite huge database (about 1 GiB when compressed with gzip) I've extracted a few subsets of it: All movies with at least 500,000, 50,000, 15,000 1,000 and 50 user votes. This data then got imported into a fresh instance of Tracker, SQLite, Lucene++ and QtCLucene. After that I've run a few trivial full text searches:

"The Matrix"
Fast Furious
"Star Trek" OR "Star Wars"
Lord Rings King
Keanu Reeves
"Brad Pitt" OR "Bruce Willis"
Jackson Samuel
Quentin Tarantino
Wachowski
Thomas Neo Anderson
Neo

Each scenario was repeated five times. To avoid cache effects each engine was tested after the others for a given set of parameters. Tracker was tested with two different scenarios: First I've tried the Nepomuk based multimedia ontology shipped with Tracker (nmm), after that I've also tried a flattened ontology (fmm) which is a much better fit for the data model of pure full text search indices like Lucene. All engines where used with default parameters. No magic configuration options or pragmas were applied. Feel free to repeat the tests with your own optimized settings, and report the results when doing so.

Source Code and Data

The source code of these benchmarks can be found at Gitorious, and can be built using autotools or qmake. Just like you prefer.

Run src/benchmark.sh to reproduce the tests. The log files can be turned into a CSV file by running src/report.sh.

The charts have been created with LibreOffice: It should be sufficient to copy the CSV data into the data sheet of logs/report.ods. Select "English (USA)" as language in the import dialog, to ensure that numbers are recognized properly. After that you still might have to sort the rows by the columns suite, num_movies and experiment. The data sorting dialog provides an option for marking the first row as column header.

Update: I've pushed some more changes, so to exactly reproduce the results discussed in this post, checkout the tags releases/0.1 for the initial results, and releases/0.2 to also include Xapian tests.

Results

Populating the Index

Lucene++QtCLuceneSQLiteTracker (Nepomuk)Tracker (Flat)Xapian
96.84 ms3.46 ms43.2 ms¹⁾36.2 ms7.13 ms52.561 ms¹⁾
1,0992.93 ms5.72 ms3.63 ms26.4 ms3.32 ms5.94 ms
3,2162.32 ms5.37 ms2.87 ms21.2 ms2.89 ms4.97 ms
17,2511.98 ms5.10 ms2.50 ms14.2 ms2.19 ms3.58 ms
121,5871.21 ms5.21 ms3.96 ms²⁾10.4 ms1.80 ms2.30 ms
  1. The dataset is tiny. I suspect that some startup overhead is invalidating this result.
  2. We might see first signs of a memory barrier here.

Query Execution Time

Lucene++QtCLuceneSQLiteTracker (Nepomuk)Tracker (Flat)Xapian
92.23 ms0.572 ms0.159 ms1.33 ms0.494 ms0.271 ms
1,0996.06 ms2.18 ms1.17 ms90.3 ms1.67 ms0.955 ms
3,2168.72 ms3.41 ms1.55 ms335 ms3.57 ms1.50 ms
17,25113.1 ms5.33 ms1.92 ms2,380 ms7.52 ms2.35 ms
121,58717.0 ms44.2 ms17.4 ms86,800 ms19.885 ms18.1 ms
ComplexityO(log(n)²)O(log(n)²)O(log(n)²)O(n log(n))O(sqrt(n))
O(log(n)²)

QtCLucene, SQLite, Tracker (Nepomuk) and Xapian seem to hit a memory barrier at 121,587 movies.

Consumed Disk Space

Lucene++QtCLuceneSQLiteTracker (Nepomuk)Tracker (Flat)XapianRaw Data
980 KiB76 KiB368 KiB4.4 MiB2.3 MiB424 KiB104 KiB
1,0994.9 MiB4.8 MiB32 MiB59 MiB29 MiB21 MiB7.8 MiB
3,21612 MiB12 MiB75 MiB114 MiB53 MiB47 MiB18 MiB
17,25139 MiB39 MiB257 MiB305 MiB155 MiB170 MiB57 MiB
121,587154 MiB154 MiB1.0 GiB906 MiB521 MiB683 MiB198 MiB

Discussion

The performance of Tracker is devastating. Entirely not the result you want to see for a project you actually like and enjoy using. You clearly see the bad impact of the many joins it must perform for mapping the ontologies and queries to SQL. This is surprising since in my opinion Nepomuk's multimedia ontology is a quite typical ontology. Also the datasets itself are not that huge for something that initially started as file indexer. The (sadly quite unrealistic) flat ontology might give a few hints on how to improve Tracker. The execution times with this ontology are comparable with them of the other engines. Still the observed (and only estimated) complexity class for executing queries is worrying.

Lucene++ shines at writing data, it is just incredibly fast when building its index. In contrast to the other engines it even spends less time per movie, the bigger its index grows. It is noticable slower than QtCLucene or SQLite when looking up terms. Still I'd call an average time of 17 ms for finding matches within 122k documents a quite good achievement. Additionally Lucene++ seems to be implemented sufficiently efficient to not hit any memory barrier yet at this scale.

QtCLucene is about two times slower than Lucene++ or SQLite when building its index, still the index size doesn't seem to impact insertion time per movie. It pays back with good lookup performance. It is about 2 to 3 times faster than Lucene++. It seems to hit a memory barrier at 122k documents.

SQLite's performance is just in the middle between Lucene++ and QtCLucene when building the index. When searching terms it even beats QtCLucene, again by a factor of 2 to 3.

Lucene++ and QtCLucene consume less disk space than the original files, most probably because the raw data stores movies and artists in separate files. The records in this files must be linked with each other. Lucene just does this more efficiently. SQLite and Tracker consume significantly more disk space than Lucene or the original data. Partly this can be explained by fields being stored twice: Once in their table and another time in the full text search index. Column indexes also play a role. Still this doesn't explain why disk consumption is significantly higher.

Xapian's characteristics are quite similar to those of SQLite. It doesn't hit yet that memory barrier that affects SQLite's insert performance at 122k documents, maybe because it consumes only 2/3 of the disk space. Enjoyed its API for being much closer to modern C++ than any other engine. It gives more low-level access to all the FTS mechanics: For instance you have to attach values and feed the indexer yourself. Also you have to deal with token prefixes. Details that Lucene just hides behind a Field class and its attributes. Not sure yet, what approach I prefer.

Conclusion

Tracker is out. Lucene++, QtCLucene and SQLite are quite comparable in terms of performance, with Lucene++ being the fastest engine when building the index, and with SQLite being the fastest when performing full text searches. There are some first signs that Lucene++ is more memory efficient than its competitors. This needs further investigation. Also we should investigate capabilities for doing point and range searches, instead of full text searches.