Rationale and discussion

The basis for handling embedding is that the embedder acts like a "window manager" for the client. (The window management protocol is defined in the X Inter-Client Communications Manual or ICCCM). The embedder selects with SubstructureRedirectMask on its window so that it can intercept, and then the client window is reparented (using XReparentWindow()) as a child of the embedder window. Because of the substructure redirect, the embedder is able to intercept calls to move or resize the client window, and handle them as appropriate to the location in the embedding application. (Map requests are also redirected, but XEmbed actually handles map requests separately... see the description of the XEMBED_MAPPED flag.)

The window management protocol is sufficient to handle the basics of visual embedding, but has deficiencies in other areas that prevent it from providing natural integration between toolkits. These areas include:

window activation state
keyboard focus
tab focus chain
keyboard short cuts / accelerators
modality
drag and drop (XDND)

The XEmbed protocol is mainly concerned with communicating additional information between embedder and client to handle these areas. Communication in XEmbed is done by forwarding slightly modified XEvents using XSendEvent(), by sending special XEmbed messages, and by setting X properties. In addition, standard ICCCM features like WMNormalHints are used where appropriate.

The next sections explain why these problems occur with the simple "window management" approach and how XEmbed solves them.

Window activation state

A widget has to know the activation state of its toplevel window. This enables input widgets like a line editor, to display a blinking cursor only when the user can actually type into it. In addition, certain GUI styles choose to display inactive windows differently, typically with a lighter and less contrasting color palette.

Unfortunately, there are no such messages like WindowActivate or WindowDeactivate in the X protocol. Instead, a window knows that it is active when it receives keyboard focus (FocusIn event with certain modes) or looses it (FocusOut event with certain modes). This applies to embedded child windows only, when the mouse pointer points onto one of the child's subwindows in the very moment the window manager puts the X focus on the toplevel window. For that reason, XEmbed requires the embedders to pass XEMBED_WINDOW_ACTIVATE and XEMBED_WINDOW_DEACTIVATE messages to their respective clients whenever they get or loose X keyboard

Keyboard focus

The delivery of keyboard events in X is designed in a way that does not correspond to the typical operation of modern toolkits; instead it seems designed to allow things to allow things to work without either a window manager or a focus handling in the toolkit. Typically, key events are sent to the window which has the X input focus (set with XSetInputFocus()). However, if the mouse pointer is inside that focus window, the event is sent to the subwindow of the focus window that is under the moues pointer. In modern toolkits, the X input focus is typically left on the toplevel window and a separate logical input focus is implemented within the toolkit. The toolkit ignores the window that the key event is actually sent to (which might be a scrollbar or other random widget within the toplevel, depending on where the mouse pointer is), and distributes key events to widget with the logical input focus.

So, for standard operation, the behavior where key events are sent to the window with the mouse pointer is simply ignored. But with embedded windows, it causes problems, since, if the mouse pointer is within the embedded window, the outer toolkit doesn't see any key events, even if the logical keyboard focus is elsewhere within the outer toolkits toplevel window.

Previous embedding techniques therefore required clients to forward any key event they receive (KeyPress and KeyRelease) to their respective embedder. In order to support multiple levels of embedding, events that stem from a SendEvent request had to be forwarded as well. While this is a possible solution, it adds both race conditions and inefficiency.

The solution proposed by XEmbed is is to beat X11 with its own weapons: The topmost toolkit is required to keep the X input focus on one of its own windows without any embedded children. Keeping the focus on such a window ensures that key events are always delivered to the outer toolkit and thus can be forwarded easily to any embedded window. This also makes it possible to use this part of XEmbed with clients that do not support the protocol at all, without breaking keyboard input for the embedding application.

In detail, the topmost embedder creates a not-visible X Window to hold the focus, the focus proxy. (It might be a 1x1 child window of toplevel located at -1,-1.) Since the focus proxy isn't an ancestor of the client window, the X focus can never move into the client window because of the mouse pointer location. In other words, whenever the outer window is activated (receives the X input focus), it has to put the X focus on the FocusProxy by calling XSetInputFocus().

The trouble with this is, that you should not use XSetInputFocus() without a proper time stamp from the Server, to avoid race conditions. Unfortunately, the FocusIn event does not carry a timestamp. The solution to this is, to ask the window manager for the WM_TAKE_FOCUS window protocol. Thus, whenever the window is activated, it will receive a WM_PROTOCOLS client message with data.l[0] being WM_TAKE_FOCUS and data.l[1] being a proper timestamp. This timestamp can be used safely for the call to XSetInputFocus().

If an embedder widget gets the logical input focus, it sends an XEMBED_FOCUS_IN message to its client. The client that receives this messages knows that its logical focus is now also the logical focus of the application window and will react accordingly. If its logical focus lies on the line editor control mentioned above, and the window is active, the editor will show a blinking cursor after processing this message.

In a similar fashion, if the embedder looses focus, it sends an XEMBED_FOCUS_OUT message.

Tab focus chain

X does not have a concept of a tab focus chain, it is up to the toolkit or the application to implement it. Since the concept is standard among almost all toolkits, XEmbed supports it. An XEmbed client integrates perfectly in the embedder's tab focus chain, i.e. the user can tab onto the client, through all its widgets and back to the outer world without noticing that they traversed an external window.

As explained in the previous section, an embedder sends an XEMBED_FOCUS_IN message to its client when it gets focus. The detail code of this message is per default 0, that is, XEMBED_FOCUS_CURRENT. It indicates that the clients keeps its own logical focus where it was. To support tabbing, XEmbed provides two more detail codes, namely XEMBED_FOCUS_FIRST and XEMBED_FOCUS_LAST, that indicate that the client should move its focus to the beginning or end of the focus chain.

When the user tabs to the very end of a client's tab chain, the client follows the request (i.e. it puts its logical focus back to the beginning its tab chain) and sends an XEMBED_FOCUS_NEXT message to the embedder. If the embedder has siblings that accept tab focus, it will do a virtual tab forward. As a result, it will loose focus itself and consequently send an XEMBED_FOCUS_OUT message to the client. As expected, the client's line edit control from the previous example will stop blinking.

Backward tabbing is done exactly in the same manner, using the XEMBED_FOCUS_PREV message.

Keyboard short cuts / accelerators

XEmbed is designed in such a way, that keyboard events are received by the toplevel window, and then sent down the focus focus chain. Toolkits will usually check for shortcuts or accelerators before sending the event to the focus widget. If such a shortcut is defined, the respective action is taken rather than passing the event through to the focus widget. This means, accelerators in the outmost window always work properly, whereas accelerators defined inside an embedded client only work if that client actually has focus. XEmbed solves this problem with two messages, XEMBED_REGISTER_ACCELERATOR and XEMBED_UNREGISTER_ACCELERATOR. With XEMBED_REGISTER_ACCELERATOR, a client can reserve a certain key/modifier combination as shortcut or accelerator. The message is passed through to the topmost embedder, where the key combination is stored. An XEMBED_UNREGISTER_ACCELERATOR message releases the key again.

Modality

If an application window is shadowed by a modal dialog, no user input is supposed to get through. The XEmbed design ensures this for keyboard input, because the toplevel window knows about its modal state and will not pass key events through. Embedded clients thus inherit the modality from the topmost embedder. Mouse input, however, is sent directly to the embedded clients by the X-Server, unaffected by the modality of the application window. To give clients the possibility to behave correctly when being shadowed by a modal dialog, an embedder can choose to send an XEMBED_MODALITY_ON message to its client when it becomes shadowed, and an XEMBED_MODALITY_OFF message when it leaves modality again. If the client contains embedders itself, those have to pass both messages through to their clients.

Drag and drop (XDND)

XDND drag-and-drop does not work with reparented external windows, since messages are exchanged with the toplevel window only. This is done for performance reasons. While it is cheap to get the window under the mouse pointer, it is very expensive to get a window under another window. Unfortunately, this is required quite often when dragging objects around, since the pointer may overlap the drag icon.

Solving the drag-and-drop problem, however, is quite easy, since the XDND protocol was carefully designed in a way that makes it possible to support embedded windows. Basically, the embedder has to operate as drag-and-drop proxy for the client. Any XDND messages like XdndEnter, Xdnd,Leave, etc. simply have to be passed through. A toolkit's XDND implementation has to take this situation in consideration.