Jul 24

To figure out how Ontos API works, use our HTTP binding in order to get started with the web service and your personal dictionary. Proceed as follows:

  1. Log-in to your personal dictionary at http://test.ontos.com/dix in order to initialize the system for your user.
  2. Run a request to authenticate:
    http://test.ontos.com/token?j_username=YOUR_USERNAME&j_password=YOUR_PASSWORD.
  3. The request will return a session token (YOUR_TOKEN).

  4. Run a request to the NLP processor:

    http://test.ontos.com/api/miner?jsessionid=YOUR_TOKEN&query={
    "get":"process",
    "ontology":"api.common.english",
    "format":"NTRIPLES",
    "text":"Ontos AG participated in the Semantic Conference in San Jose and is planning to join ESTC2009 in Vienna this year."
    }

    This request will return a set of triples in JSON format. The processor recognized “Semantic Conference” as a conference, but didn’t do so for “ESTC2009″.

  5. Sign in to your personal dictionary at http://test.ontos.com/dix, select the instance type “Conference” and add ESTC2009.
  6. Re-run the request to the NLP processor. Now it will have recognized “ESTC2009″ as a conference as well.

See full documentation to learn more about what Ontos API can do.

Jul 23

If you copy & paste HTTP requests (e. g. directly form the doc) into the browser’s address bar, be aware to replace the hash sign (#) by %23. Otherwise the request will be broken by the browser.

Jul 20

We are happy to announce that the beta version of Ontos API is now pushed on-line together with documentation and some quick demos. More demos and showcases will come in due time (oh, we hope they will also come from you—yes! feel free to drop us a link to your own apps based on Ontos API).

And now a couple of short notices about our one-day-old kid:

  • to get an API key please request one here, and shortly after this you’ll get a message with a user ID and a password which you can use in your API calls
  • in the beta phase we have all functions for the ontology-driven text analysis and reading the RDF store; in the next releases we will add the following features:
    • new languages (German, French and Russian will come as soon as they are available but no later than by the end of October 2009)
    • full personalization (personal database linked to your account for new dictionaries and ontology’s—scheduled for November 2009)

Feel free to ask us whatever you wanna know about the new Ontos API! Leave comments to this entry or e-mail us at support@ontos.com.

Sep 30

Ontos CEO Daniel Hladky (left) at the exhibition, ESTC2008We are back from Vienna (see my previous post about the ESTC2008 conference), where we were at a wonderful gathering of people from the SemWeb community, discussing recent trends in research, technology, and business applications. A bit later I will upload some pictures to share the nice ambience of the event. For more details look at Ivan Herman’s report in his blog.

Some more news. We’ve just created a Facebook account for Ontos International. Now you can join us there!

Sep 18

ESTC2008 to be held in Palais Niederösterreich in Vienna Now we’re heading to Vienna, where Ontos is a sponsor of the European Conference SemTechnologies to be held on Sept 24-26. Irina will give a talk titled ‘Semantic Technologies and Information Integration: Semantic Wine in Media Wine-skin’ on Wednesday. Come also to Daniel’s presentation ‘Intelligent web pages leading to new business’ on Thursday.

We hope to see all of you there. And of course, feel free to contact us to arrange a meeting next week at ESTC2008!

Mar 27

A couple of days ago, an exciting article on various aspects of semantic technologies was posted on ReadWriteWeb: Semantic Web Patterns: A Guide to Semantic Technologies. Alex Iskold, the author of the article, highlights many important problems of information workers which can be solved by means of the semantic web. This is a nice overview of what has been done by the moment in the area of annotation technologies based on meta data, semantic APIs, search technologies, semantic databases, etc. The notion of the Semantic Web itself is considered from various points of view, since as the author notes,

The Semantic Web means many things to different people, because there are a lot of pieces to it. To some, the Semantic Web is the web of data, where information is represented in RDF and OWL. Some people replace RDF with Microformats. Others think that the Semantic Web is about web services, while for many it is about artificial intelligence - computer programs solving complex optimization problems that are out of our reach. And business people always redefine the problem in terms of end user value, saying that whatever it is, it needs to have simple and tangible applications for consumers and enterprises.

Dec 20

In the natural language processing blog, Hal Daume III has recently written an exciting post where he argues that data mark-up is not natural. To capture the fuzzy notion of naturalness he compares parallel French-English data and part-of-speech tagged data. Parallel French-English data are natural, he says, because they “naturally” exist: one shouldn’t have particular knowledge to understand what these data are and how to use/create them. As for POS-tagged data, they are unnatural and useless for non-linguists/non-developers.

The crux of the argument is that if something is not a task that anyone performs naturally, then it’s not a task worth computationalizing.

The further argumentation consists in that there is no external evidence that when a human translates a text, she performs special operations similar to, say, part-of-speech tagging. Mapping the idea onto the Ontos NLP one could say then that there is no evidence that a human piles up some objects like annotations connected with various fragments of the text using a huge number of patterns similar to the JAPE-patterns we are using. Moreover, it is unbelievable that a human reads the same text scores of times to perform a specific operation each time: first to read the whole text to recognize morphological information, then to recognize the named entities, finally to learn the relations between the recognized entities and so on and so forth. Rather, a human somehow interprets the string, applying by turns various subsystems of the Language Faculty, which are required at the moment: e.g., most likely the encyclopedia is not used permanently, but just in the cases it is necessary.

Still, I am not in the position to maintain the idea that a natural language processor must be “natural” in all its aspects. Mark-up is not natural, but it is not a so bad idea. It is noticed long ago that computers are very dissimilar to the human cognitive system. Suppose someone will implement, for instance, all the complex syntactic algorithms proposed in Generative grammar; if such a system is possible at all, it might be too complex and unusable. Computers’ peculiarities sometimes make us to avoid “naturalness” in the NLP, although it is not to say that NLPers should not try to reach it where it is appropriate and possible.

Dec 04

On Read/WriteWeb there was a post about 10 semantic Apps to watch. It seems that the terminology of semantics is used for all sort of kinds. Our understanding is that semantic applications should use some form of ontology in order to describe the meaning of data/text and this includes named entities and relations between the named entities. In my opinion not all of the presented candidates fulfil this requirement. Nevertheless it will be interesting to see if the semantic search engines will leave the beta status and be able to answer questions to all sort of domains. Another point to watch is if the other Apps add more semantic meaning to their solution. At least Ontos with his news portal already uses the semantic meaning and offers first semantic services and therefore we are happy not to be alone in the take off of semantic Apps.

Oct 30

The single spot in our NLP where compositionality is fully abondoned, is Semantic Tagger. Because of the syntactic and semantic complexity of structures we have to analyse, this module can’t extract the interpretation of a given construction (say, a finite sentence about the relation “be employee of”) step by step from the interpretation of its syntactic parts. We can’t do it not just because we don’t have a sufficiently powerfull syntactic analyser, but we simply don’t need it.

To extract a relation like “be employee of” we use a number of patterns relying on entities recognized previously. Look at (1).

(1) In the mid-1990 the former German citizen Heinz Schimmelbuch becomes CEO of Weissel company,

If the processor already knows that mid-1990 is a Date/Period, Heinz Schimmelbuch is a Person, CEO is a Job Title, Weissel company is an Organization, become is a verb of coming into being or smth. like that, and that there is no punctuation between these items, it has enough information to conclude that this sentence might speak about an employment relation, as long as several additional conditions are satisfied. The first condition is the items’ order in this sentence. The sentence (2) might also be an employment-related sentence, but not (3).

(2) The former German citizen Heinz Schimmelbuch becomes CEO of Weissel company in the mid-1990.
(3) In the mid-1990 Heinz Schimmelbuch becomes American citizen and meets the CEO of Weissel company.

The second condition concerns some specific constraints on the key verb. E.g., the verb become has to be finite: with an non-finite verb, the probability that this sentence is irrelevant, increases, cf. (4).

(4) In the mid-1990 the former German citizen Heinz Schimmelbuch dreamed about becoming CEO of Weissel company.

Then, taking into account these observation, we can safely add a pattern using just the recognized items like Person, Job Title, etc. and some punctuation markup (commas, points, etc.), “hiding” all the irrelevant part of the sentence.

(5) … the mid-1990 … Heinz Schimmelbuch … become … CEO … Weissel company.

Now we can compose such a pattern.

({Date} | {StartPoint})? {Person}
{becomeVG.VOICE = “act”, becomeVG.MOOD = “ind”}
{JobTitle} {Organization}

Oh yeap, this pattern is indeed used in our system.

Oct 24

I came accross an exciting paper by Karin Verspoor, George Papcun and Kari Sentz from LANL. The authors suggest a motivation for the so-called shallow approach to Information Extraction.

Indeed, an ongoing discussion between ‘deepers’ (who argue for a deep and presumably full semantic and syntactic analysis) and ’shallowers’ (who argue for a shallow analysis) revealed that “[s]hallower approaches are more robust to the linguistic variance of free text”, but “they are much faster”, while “[d]eeper approaches … are in principle more domain-neutral because they embody general linguistic principles”, but they are obviously much more expensive both in throughput and development time. I’ve got a particular interest in this issue, because in Ontos we use a kind of shallow approach, and we really do not attempt to get a full syntactic and semantic analysis. Our rules and patterns primarily focus on the recognition of objects and relations represented in the domain ontology.

Verspoor et al. 2003’s motivation relies on the Construction Grammar’s (CG) hypothesis that constructions are the essential linguistic units stored in the human mind. The basic notion of construction is defined in the following way:

C is a construction iff C is a form-meaning pair Fi, Sj>, such that some aspect of Fi (form) or some aspect of Sj (semantics) is not strictly predicted from C’s component parts or from other previously established constructions.

The main argument of constructionists is that in natural language there is a wide range of non-compositional expressions (e.g. idioms), which should be stored in the lexicon. As far as I understand, the radical constructionist view is that any linguistic structure (w.r.t. its syntax and semantics) is a construction. Although the authors note that CG does not entirely reject the principle of compositionality adopted in formal semantics, this principle does not in fact play an important role in CG. Furthermore, the authors suggest that gazetteer entries and expressions captured by syntactic patterns for named entities in a shallow NLP, are constructions in the sense of CG.

I would not be so enthusiastic with the issue of (non)compositionality in the field of IE. Presumably, using gazetteers has nearly nothing in common with (non)compositionality of their entries. In fact, most entries of key-gazetteers (those providing context information) are splendidly compositional (cf. international company, regional hospital), and e.g. patterns using any output of the Morphology Component (i.e. specifying just grammatical features and order of their units) produce also compositional phrases.

But still, there is a special subfield in our NLP, where patterns (not gazetteers) do involve a kind of non-compositional phenomena. I’ll write about it in the next post.