Implementation of modality

The protocol could be simplified by removing the XEMBED_MODALITY_ON and XEMBED_MODALITY_OFF messages in favor of requiring the embedder to map an input-only window over it's child when it beings shadowed by a modal grab.

One possible reason for the current protocol is that a toolkit might want to have elements such as scrollbars that remain active even when grab shadowed. (I know of no toolkit that actually implements this.)

Clarify function of timestamps

The function of the timestamp arguments needs to be clarified, as well as the requirements for what should be passed in the field. The original draft of the specification contained the text about the determining the timestamp.

The x time is to be updated whenever the toolkit receives an event from the server that carries a timestamp. XEmbed client messages qualify for that.

Hint to implementors: Check that the xembed time stamp is actually later than your current x time. While this cannot happen with ordinary XEvents, delayed client messages may have this effect. Be prepared that evil implementations may even pass CurrentTime sometimes.

But I [OWT] wouldn't agree with this advice. The point of a timestamp is to make sure that when events are processed out of order, the event generated last by the user wins for shared resources such as input focus, selections, and grabs. An example of where this can matter is if you have

     Toplevel Window
	   Text Entry 1
	   Text Entry 2

If the entries are set to select the text on focus in, and the user hits TAB in quick succession, then the timestamps on the FOCUS_IN events are what makes sure that Entry 2 actually ends up owning the PRIMARY selection, instead of it being a race between the two clients. But in situations like this having the correct timestamp only matters if a user action triggers the behavior.

Hence the advice that the timestamp should be the time from the event currently being processed.

If no explicit user action is involved, then the best thing to do is to use CurrentTime; using the timestamp from the last X event received can cause problems if the ultimate trigger of the behavior is a timeout or network and the last X event happened some time in the distant past.

Complexity of accelerator handling

The current specification for accelerator handling is a little complex. Most of the complexity (the accelerator IDs) comes from the need to handle conflicting accelerators. GTK+ currently implements a simpler scheme where grabs are identified only by key symbol and modifier and conflicting mnemonic resolution doesn't work across embedder/client interfaces.

Infinite loops in focusing

There is the potential for infinite loops of focusing - Consider the case:

     Toplevel Window

Where there are no focusable sites in the client or in the toplevel window. Then if Tab is pressed, the embedder will send: FOCUS_IN/FOCUS_FIRST to the client, the client will send FOCUS_NEXT to the embedder, the toplevel window will wrap the focus around and send FOCUS_IN/FOCUS_FIRST to the client...

The minimum mechanism that seems necessary to prevent this loop is a serial number in the FOCUS_IN/FOCUS_FIRST message that is repeated in a resulting FOCUS_NEXT message.

A possibly better way of handling this could be to make FOCUS_IN have an explicit response; that, is, add a XEMBED_FOCUS_IN_RESPONSE that the client must send to the embedder after receipt of a FOCUS_IN message.


detail1 if the client accepted the focus, 0 otherwise
data1serial number from XEMBED_FOCUS_IN

The main problem with requiring a response here is that caller needs to wait for the return event, and to handle cases like parent (client 1) => child (client 2) => grandchild (client 1), it probably needs to process all sorts of incoming events at this point. If the user hits TabTab in quick succession things could get very complicated.


The protocol, as currently constituted, is not robust against the embedder crashing. This will result in the embedder window being destroyed by the X server, and, as a consequence client's window being unexpectedly destroyed, which will likely cause the client to die with a BadWindow error.

To fix this requires an X protocol extension which extends the functionality of XChangeSaveSet() in two areas:

  • Allow it to be specified that the saved window should be reparented to the root window rather than to the nearest parent. (The nearest parent typically being the window manager's frame window, reparenting to the nearest parent only saves the client until the window manager cleans up and destroys the frame window.)

  • Allow it to be specified that the saved window should be unmapped rather than then mapped. (Without this capability the client will mapped as a child of the root window, which will be confusing to the user.)


Toolkits such as Qt and GTK+ have a concept of disabled widgets. This notion is typically hierarchical, so if the embedder or a ancestry of the embedder becomes insensitive, widgets inside the client should be displayed as, and act insensitive as well.

Directional focusing

Some toolkits, such as GTK+, support, along with the standard concept of a focus chain, the idea of directional focusing; it's possible in some cases to navigate focus using the arrow keys. To do this perfectly, you need to have information about the coordinates of the original focus window, which is hard to do in an embedding context, but a good approximation is to, when focusing into a container, provide the side of the container where focus is coming from and to focus the "middle widget" on this side.

This could be supported by adding an extra data field to to the XEMBED_FOCUS_FIRST/XEMBED_FOCUS_LAST subtypes of XEMBED_FOCUS_IN and to XEMBED_FOCUS_NEXT and XEMBED_FOCUS_PREV, which would contain:

/* Directions for focusing */

Applications supporting only normal tab focusing would always pass XEMBED_DIRECTION_DEFAULT and treat all received directions as XEMBED_DIRECTION_DEFAULT.

The argument against supporting this is that it's a rather confusing feature to start with (many widgets eat arrow keys for other purposes), and becomes more confusing if you have a application containing widgets from different toolkits, some of which support it, some of which don't.

Modal dialogs

The specification doesn't have any provisions for handling the case where an embedded client wants to put up a dialog. Such a dialog should be transient-for the real toplevel window, and, if modal, should block the entire toplevel window. To fully implement this, you would need some concept of an application that spanned multiple toplevel windows in multiple clients.

Propagation of key presses

It's frequently useful to have key bindings that trigger on a widget if the focus is on a child of that widget. For instance, ControlPageUp and ControlPageUp switch pages in a notebook widget when the focus is on a child of the notebook. The XEmbed spec currently has no handling of this situation.

The simplest solution would be to specify that if the client widget doesn't handle a key press sent to it, it then sends the event back to the embedder. Some care would be required in the embedder handle infinite loops, but it shouldn't be that bad.

Handling of toplevel modes

GTK+-2.0 contains a feature for key navigation of tooltips where Control-F1 toggles a "tooltips keyboard mode" where the tooltip for the currently focused window is displayed. There is no way of propagating this across XEMBED. This feature could clearly be implemented the same way as XEMBED_WINDOW_ACTIVATE, but adding a pair of messages for every feature of this type seems excessive.

A possible alternate idea would be to add a _XEMBED_STATE property that the embedder sets on the client window which is a list of atoms. This could actually be used to replace XEMBED_WINDOW_ACTIVATE, and XEMBED_MODALITY_ON, simplifying the protocol.

There are some race conditions in maintaining this property if the client is allowed to reparent itself out of the embedder that would have to be considered.