Interactive Search

Motivation

For ease of interactive exploration and comparison between our three target representations, there is an experimental search interface (hosted at the University of Oslo; see Kouylekov & Oepen, 2014).  Using a query-by-example approach, it is possible to retrieve instances of specific semantic phenomena, across different annotations, and inspect matching semantic dependency graphs graphically.

Query Language

The SDP search interface interprets a simple set of search operators, collectively dubbed the WeSearch Query Language (WQL).  By way of informal introduction, consider the following example query:

  /v*[ARG* x]
  quarterly[ARG1 x]
  x:+result

The query above is comprised of three predications, conventionally shown as one per line.  In this example, the following characters have operator status: ‘/’ (slash), ‘*’ (asterisk), ‘[’ and ‘]’ (left and right square bracket), ‘:’ (colon), and ‘+’ (plus sign).  This is a near-complete list of operator characters in WQL.  Each predication can be composed of (i) an identifier, followed by a colon if present; (ii) a form pattern; (iii) a lemma pattern, prefixed by a plus sign, if present; (iv) a part-of-speech (PoS) pattern, prefixed by a slash, if present; a sense pattern, prefixed by an equal sign, if present; and (vi) a list of arguments, enclosed in square brackets, if present.  Patterns can make use of Lucene-style wildcards, with the asterisk matching any number of characters, and a question mark (‘?’) to match a single character.

Argument specifications in WQL take the form of rolevalue pairs, where roles draw from a fixed inventory of pre-defined argument labels (specific to each target representation), and values are predication identifiers defined in other parts of the query.  The role label and value are separated by whitespace, and multiple arguments can be specified within the list by using a comma (‘,’) as the separator.  In role labels, wildcards can be used just like in other query fields.

Thus, our example query above searches for a verbal predicate (any PoS tag starting in ‘v’), that takes any form of the lemma ‘result’ as its argument (this query is designed for the DM representation, where regular argument relations take the form ARG1 ... ARGn).  The query processor will ensure a one-to-one correspondence between query elements and matching graph elements, i.e. multiple distinct query components cannot match against the same target (graph component), or vice versa.  Lemma and PoS patterns, as well as role labels, are not case-sensitive.

In addition to the query proper, the search interface provides a set of radio buttons to select which of the three target representations to query; this selection can have implications for the matching of representations-specific properties (e.g. lemmas and predicate senses) and for the interpretation of underspecified role labels (see below).  It is possible to search multiple representations in parallel (all three are active by default), and independent of the active set of representations for the search, annotations in all target representations will always be presented for inspection for the items that matched the query.

The result page uses a tabbed display organization, aiming to make it easy to switch between target representations and graph or tabular display of matching items.  Color highlighting is used to indicate which parts of each result structure were matched by corresponding components of the query; as there can be more than one match in a single result, the interface allows ‘cycling through’ individual matches, one by one.

Boolean Connectives

In our example query above, the individual predications are implicitly conjoined, i.e. all three need to be matched against a candidate result graph for the query to be satisfied (formally, one might say that the whitespace separating predications serves as a conjunction operator).  Albeit with somewhat mixed feelings, we further experiment with additional boolean connectives in WQL, viz. negation (‘!’, exclamation point) and disjunction (‘|’, vertical bar); to complement these logical operators, parentheses (‘(’ and ‘)’) can be used to group expressions, to make explicit or override the scoping of logical operators.  By default, negation and conjunction bind stronger (i.e. scope narrowly) than disjunction (which scopes widely, i.e. at the top level or within an enclosing logical group).

More Examples

Following is a more complex example, searching for object equi verbs and taking advantage of an underspecified role label:

  [ARG2 x, ARG* e]
  e:/v*[ARG1 x]

A similar effect, requiring the ‘downstairs’ predicate to be any type of argument (within certain assumptions about the applicable range of role labels) to the ‘upstairs’ one, could instead be achieved using a disjunctive statement (note the need for logical grouping of the two disjuncts, in relation to the conjunction):

  ( [ARG2 x, ARG3 e] | [ARG2 x, ARG4 e] )
  e:/v*[ARG1 x]

The following query demonstrates the use of the top operator (‘^’), to retrieve graphs rooted in a coordinate structure, i.e. where the top node has an outgoing dependency matching the pattern ‘_*_c’ (again, assuming the DM representations); here, specification of the role value can be omitted, as there is no predication constraining the argument node:

  ^[_*_c]

As an example of the (experimental) use of negation to filter candidate results, the following query will match occurences of verbal nodes that have no outgoing or incoming argument links:

  x:/v*
  !x:[* y]
  ![* x]

However, in early August 2014, the definition and implementation of boolean operators in WQL to some degree is still work in progress.

Full List of Operators

  • ^ (caret), constrains the node to be a top node (must be predication-initial);
  • : (colon), separates optional node identifier from node content;
  • [ and ] (left and right square brackets), separate outgoing arcs;
  •   (whitespace), separates role labels and values in list of arcs;
  • , (comma), separates role–value pairs within list of outgoing arcs;
  • = (plus sign), indicates (optional) sense object property;
  • + (equal sign), indicates (optional) lemma object property;
  • / (slash), indicates (optional) pos property;
  • ? (question mark), Lucene-style single-character wildcard;
  • * (asterisk), Lucene-style arbitrary sub-string wildcard;
  • ( and ) (left and right square parentheses), group sub-expressions (see below);
  • | (vertical bar), logical disjunction of predications or groups;
  • ! (exclamation mark), reserved for negation (must precede a predication or logical group);
  • \ (backslash), escape character, suppress operator status for any of the above.

Contact Info

Organizers

  • Dan Flickinger
  • Jan Hajič
  • Angelina Ivanova
  • Marco Kuhlmann
  • Yusuke Miyao
  • Stephan Oepen
  • Daniel Zeman

sdp-organizers@emmtee.net

Other Info

Announcements

[06-feb-15] Final evaluation results for the task are now available; we are grateful to all (six) participating teams.

[08-dec-15] The evaluation period is nearing completion; we have purged inactive subscribers from the task-specific mailing list and sent out important information on the submssion of system outputs for evaluation to the list; if you have not received this email but are actually preparing a system submission, please contact the organizers immediately.

[17-dec-14] We are about to enter the evaluation phase, but recall that the closing date has been extended to Thursday, January 15, 2015. We have sent important instructions on how to participate in the evaluation to the task-specific mailing list; if you plan on submitting system results to this task but have not seen these instructions, please make contact with the organizers immediately.

[22-nov-14] English ‘companion’ syntactic analyses in various dependency formats are now available, for use in the open and gold tracks.

[20-nov-14] We have completed the production of cross-lingual training data: some 31,000 PAS graphs for Chinese and some 42,000 PSD graphs for Czech. At the same time, we have prepared an update of the English training data, with somewhat better coverage and a few improved analyses in DM, as well as with additional re-entrancies (corresponding to grammatical control relations) in PSD. The data is available for download as Version 1.1 from the LDC. Owing to the delayed availability of the cross-lingual data, we have moved the closing date for the evaluation period to mid-January 2015.

[14-nov-14] An update to the SDP toolkit (now hosted at GitHub) is available, implementing the additional evaluation metrics ‘complete predicates’ and ‘semantic frames’.

[05-aug-14] We are (finally) ready to officially ‘launch’ SDP 2015: the training data is now available for distribution through the LDC; please register for SemEval 2015 Task 18, and within a day (or so) we will be in touch about data licensing and access information.

[03-aug-14] Regrettably, we are running late in making available the training data and technical details of the 2015 task setup; please watch this page for updates over the next couple of days!

[01-jun-14] We have started to populate the task web pages, including some speculative information on extensions (compared to the 2014 variant of the task) that we are still discussing. A first sample of trial data is available for public download.