| |

Knowledge Bases in the World Wide Web:
A Challenge for Logic Programming

DFKI GmbH, Kaiserslautern, Germany
E-mail: boley@informatik.uni-kl.de

Abstract:
Regarding the World Wide Web, knowledge bases can be categorized between (HTML) documents and (SQL) databases. In order to standardize them, the use of Horn logic for Web publications is proposed. The design of a Web search engine for processing distributed Horn-logic knowledge bases is sketched. Some of the research issues to be solved are elaborated from the perspective of (parallel, modular) logic programming. Possible Inter- and intranet applications are discussed. An LP-community effort for building up Horn-logic knowledge bases in the Web is encouraged.

1. Web-Knowledge Reuse via Horn Logic

The World Wide Web has quickly established itself as the standard accepted medium for information supply, for example for (hyper-)texts, images and databases. But what about the supply of standardized knowledge bases in the WWW, which could constitute a leap toward knowledge reuse? After a promising initialization of the WWW publication of KIF/Ontolingua knowledge bases (e.g. [7]) by Thomas Gruber right at the beginning of the expansion of the Web, little progress is observable here since he left the Knowledge Systems Laboratory at Stanford University. Also the growth rate of WWW knowledge which is represented (e.g. CGs, CycL, CommonKADS) or accessed (e.g. KQML, April) in other proposed AI formats is not keeping track with the general Web expansion rate. This may be due to the complex structure and insufficient support of these formats - in comparison, e.g., to the use of the SQL-database standard for HTTP/SQL-gateways: Despite the distribution of ANSI proposals for KIF and CGs it seems that even in these cases there is not enough `investment security' for knowledge publications.

Could, in this situation, a (HTML-)version of Horn logic, the established basis of logic programming, play the role of a declarative WWW representation standard? First of all, the relational databases, which are already available in the Web could be principally viewed as Datalog special cases of such Horn-logic knowledge bases, normally restricted to ground facts and non-recursive rules - apart from problems in connection with null values (database schemas will be discussed later on). Moreover the accumulated program libraries in pure Prolog (from Edinburgh to ISO) could be easily revised for Web publication. Also, knowledge bases in modern logic languages such as Prolog or Gödel, or in functional-logic languages [3] such as Escher, Curry or Relfun could also be more or less easily reproduced in Horn logic. Finally, further translators between subsets of the above mentioned exchange formats and Horn logic could be tackled: e.g. partial KIF-Prolog translators have been written already; the Horn-logic reduction of sort lattices will be mentioned below. A desirable side effect would be that translation attempts between concrete knowledge bases (via Horn logic) would increase the comparability of new LP proposals. With the addition of integrity constraints [4] Horn-logic knowledge bases can also be easily maintained interactively [1]. From the perspective of the user of relational databases this would be a transition to a standard for deductive databases [4] as far as expressive power is concerned. Altogether, the number of available Horn-logic WWW knowledge bases should most likely reach the critical mass that would cause automatically processed queries provide interesting results to end-users.

2. Prolog Goals for Web-Distributed Clauses

Another important point arises from the Web distribution of knowledge bases. The nowadays usual fact-retrieval usage of databases in the Web considers a SQL database as a black box on a given server, which can be queried via a form. In contrast to this is the document-retrieval use of (hyper)texts via search engines, which find appropriate Web pages arbitrarily distributed on servers using key-word combinations. The proposed Horn-logic knowledge bases in the WWW constitute an information category between SQL databases and HTML documents. The following question arises: Is it possible, on the one hand, to distribute this knowledge using the paradigm of HTML documents and to access it, on the other hand, using the paradigm of SQL-form queries or Prolog goals? In practice this would mean: Web-distributed knowledge bases should not be queried by the (primitive, since string-oriented) keywords of usual search engines; instead Prolog-like goals should be used, perhaps via a form-interface, i.e. atoms, which may contain logical (i.e. consistently bound) variables, which, in the Prolog way, may be combined conjunctively (with a comma) or disjunctively (with a semicolon), and which perhaps could also be negated in a way suitable for the Web (global negation as failure would make little sense in the `open Web world'; see discussion below). Apart from the increased expressive power, compared to SQL, through variables and structures in clauses (facts and rules), such knowledge bases inherit an advantage from HTML documents - the possibility of seeing a found answer to the current query (atom or string) in the context of neighboring potential answers (atoms or strings), i.e. to associatively browse (in knowledge bases or documents) instead of only communicating with a deductive black box: As an optional `explanation' of answer bindings it would be possible to highlight both the atomic facts to which a goal is reduced and the rules used to derive it, either within the knowledge bases or - more focused - the predicate definitions containing them.

Such use of Web-distributed knowledge could theoretically be implemented using key-word searches (on a lower level also atoms are strings). In practice the realization is an interesting challenge for logic programming.

3. Design Issues for a Horn Search Engine

Below follows a first sketch of a possible search engine for knowledge bases (KBs) Horn-published in the Web, henceforth called a KB-search engine.

This would be useful even if it would just take better into account the formal (Horn-logic) syntax of knowledge-base entries than present text-search engines do. But here initial semantic and inferential abilities will be added.

First, for typing each knowledge base, it would be sensible to give a KB label (of or in) its HTML page, e.g. already in the header, to consider only relevant, i.e. KB pages while doing a search (the other way round, the normal text-search engines could ignore KB areas). A variant, at least for Datalog, is the tag table of HTML 3.0, with which predicate definitions can be presented as tables (but a transition to the more general SGML might be advantageous also for this Web use, exploiting generic identifiers for KBs, predicates, clauses, and below). In any case it is useful, analogously to a SQL-database schema, to declare the signatures of n-ary predicates in order to establish the meaning of the n arguments. The sorts usable here can themselves be defined as unary predicates in sort lattices (e.g. via single-premise rules or binary subsumes-relationships between the predicates). Now, the KB-search engine, before using a found predicate definition, first tests its signature compatability wrt the perhaps itself sorted goal atom. This semantic prefiltering avoids elementary predicate ambiguities, such as those known from the corresponding tiresome key-word ambiguities of todays text-search engines. In this way it is in principle possible to attain a higher precision using formal KBs than using natural-language texts. Modeled on the joint FAQ elaboration for newsgroups, expert groups would be able, again with the help of the Internet, to agree on their special terminology and their sort sublattices, with interfaces to related knowledge areas. Gradually it would thus be possible to reduce the ambiguity of predicate names.

The inferential kernel problem of a KB-search engine in the Web corresponds to the problem of OR-parallel architectures such as Aurora [12], whose resources have to be balanced between local search (normally sequential) and global search (parallel, communication-intensive). The KB-search engine has to suitably interleave resolution on a (found) KB page with the Web supply of (further) KB pages. Here is also a connection to the theme of modular/contextual LP [5]: When should a goal be proven locally in the actual KB page (module), when should be globally proceeded to other KB pages (modules)? When should backtracking cross page (module-)boundaries? While a page-local sequence of resolution steps is useful mainly for recursive rules, the global search process has to continuously spread in parallel over the KB pages (e.g. with algorithms such as fish-search [13]). Local non-terminations then cannot endanger the global success of a query: Even if it could be presupposed that all knowledge bases published in the Web would be terminating individually, there could still be non-terminations caused by the interaction of distributed knowledge bases, e.g. via two distributed co-recursive rules. Since it is difficult to capture the (model theoretic) semantics of a predicate definition even with signature declarations if its clauses are distributed over several knowledge bases, it will be beneficial to agree on more or less strong restrictions here. The strongest locality principle would completely forbid predicate distribution, a middle one would only allow the distribution of facts, and the weakest would also tolerate distributed non-recursive rules. Furthermore the important question arises whether, for uniformity of the semantics, also the clauses within a KB page should be interpreted according to pure Horn logic in an OR-parallel (breadth-oriented) manner instead of the Prolog-like use of backtracking. In any case one would rather expect solutions to surface Lycos-type, in groups, than Prolog-type, by individual more requests; their order could also reflect certainty factors, starting from the facts used. Due to the openess and dynamics of the WWW the issue of logical `completeness' of solution sets would hardly arise: The KB-search engine even has to cut off the search processes which are still running (possibly non-terminating), if no (further) solution group is requested; a SQL-type (total-)set orientation would thus not be sensible here.

Finally, it is necessary to examine how the indexing techniques of text-search engines, databases, and Prolog can be transferred to the KB-search engine. Within a KB page the signature declarations preceding the predicate definitions (`procedures') can play the role of indexed HTML headings. A finer KB indexing down to the level of single clauses would be conceivable analoguously to a complete text indexing, with advantages and disadvantages similar to those of Alta Vista.

The KB-search engine outlined above mainly works interpretively on KB sources, which also can be used with normal browsers. This permits, apart from black-box queries, to also directly inspect and configure knowledge bases much like text building blocks. Only by offering this direct manipulation of (Horn-logic) standardized knowledge do knowledge bases have a certain chance to become common property in the way of EXCEL tables: For this not only conducting knowledge-base queries (spread-sheet calculations) is important, but also helping with the construction, change, combination and bidirectional text coupling of databases (spread sheets). This could be supported by the hypertext mechanisms of the Web. For instance, the submodules of a knowledge base would have to be accessible through the KB-search engine as well as interactively via a special HTML link. If, due to efficiency considerations, the compilation of a Horn-logic knowledge base becomes necessary, it could be tried to simulate direct manipulation, e.g. via back-references to the source and automatic recompiliation after changes. In contrast to an even more efficient native-code compilation, a WAM compilation would preserve the advantage of pseudo-code transmission and client-side emulator execution first used by Java.

4. Internet Experience and Intranet Prospects

Due to the comparatively simple formalizabilty of technical knowledge, it is sensible to first construct Horn-logic knowledge bases for this area. In the VEGA project we have done this mainly for subareas of material science (e.g. [2]). However, knowledge bases such as RTPLAST are published mainly as pure ASCII files (in Relfun syntax) in the Web. It is planned to include more Horn-logic knowledge bases, e.g. to compare functional-logic programs. However it is still open which operational variant of Horn logic is the most suitable for the publication of practical knowledge bases in the Web: Experience would encourage to exploit the textual order of clauses (OR sequentiality) and their premises (AND sequentiality), while the distributed Web search suggests an at least OR-parallel interpretation. After the decisions necessary here, mainly a more exact specification of the HTML additions for Horn-logic knowledge bases and a revision of the search engine operating on them has to be completed before a prototype implementation (e.g. in Java) can be begun. For reasons of acceptance, the efficiency of the KB-search engine plays a major role of course; it could be increased both by progress in LP (e.g. better OR-parallel strategies) and in WWW technology (e.g. better efficiency of Java).

Interesting possibilities for WWW knowledge bases are also offered, under the heading intranets, by the trend towards organization-internal information supply on the basis of Web technology. In analogy to a central corporate memory [8] the intranet knowledge bases of a networked organization (via LAN or WAN) would become the knowledge component of a distributed corporate memory. The information deficits in large organizations, which cost time and money, show that even for intranet knowledge bases it would be useful to have a KB-search engine, which because of the smaller knowledge volumes would be able to work faster here than in the open Internet. If organizations use the externally agreed upon (Horn-logic) standard also as their internal representation format, then they can move knowledge bases back and forth between their intranet and the public Internet as needed: Inward to take over or adapt new know-how; outward for demos and perhaps (cryptographical) sales (using an Internet payment standard) of their expertise. Analogously, a unified representation will permit flexible alliances in knowledge-intensive industries.

5. The Challenge: A Community Effort

Besides the technical issues discussed in the body of this paper, the building up of knowledge bases in the World Wide Web has a strong `community' component: While interest in providing and using Inter- or intranet KBs is growing, we need to converge on a standard representation format. Since Horn logic already is the language of choice for many non-Web KB publications and at the core of many LP extensions, it looks like the right place to start with. If Web-KB providers use Prolog syntax, their KBs will be translatable into the ultimate format indexed and retrieved by the envisaged Horn-logic KB-search engine, since there already exists a sufficient number of `legacy' Prolog KBs also requiring such a translator. When attempting to build reusable Web Horn KBs, the harder part will probably be common `ontological' decisions concerning compatible definitions of predicate names. However, the sort hierarchies needed for structuring non-trivial domains can, for uniformity, be regarded as unary predicates, again defined by (simple) Horn rules. Once a search engine hits enough clauses on typical goals, a self-accelerating effect in the growth of Web KBs may start, similar to the ones we are witnessing for other information categories in the Web.

This paper has attempted to show that the LP community is well-prepared to venture on such a major Web effort. Among the approaches to create a Web infrastructure for reusing logic programs or KBs, the present proposal comes closest to the one independently developed in [11]: ``Lightweight deductive databases'' are deductive databases incorporated into Web pages to provide a source of distributed structured information. Their principal features, distributed maintenance, extensibility, and reusability, as well as notions from modular LP are common with our proposal. However, like other search engines, we stick to server-side processing as the default. Additionally, we stress purity in the sense that no extra operators, e.g. for invoking goals in other modules, are used. Instead, a pure Horn-logic KB should be loadable unchanged into a KB-tagged Web page, where the KB-search engine deals with the modular structure of pages much like textual search engines do. Purity also supports OR-parallelism as a paradigm to exploit Web-distributed KBs, implementable by `agent replication' such as in the fish-search algorithm. Finally, besides deducing answer bindings, we feel it is important that a KB-search engine can also `explain' solutions by highlighting clauses used for a deduction in the context of their KB pages.

Altogether we do not propose a new LP language for the Web but the pragmatic reuse of Horn-logic KBs - in a Prolog-like syntax - by a new search engine. Once some experience has been accumulated with Web-searching such KBs, Web-oriented - but still pure - extensions of the Horn-logic kernel language could be carefully introduced. But before going into the nitty-gritty details - and quarrels - of defining a `Web LP' language, we should experiment with one or more search engines for a fixed (Horn-)language.

Bibliography

[1] A. Abecker, H. Boley, K. Hinkelmann, H. Wache, and F. Schmalhofer. An Environment for Exploring and Validating Declarative Knowledge. In: Proc. Workshop on Logic Programming Environments at ILPS'95, Portland, Oregon, Dec. 1995.

[2] H. Boley, U. Buhrmann, and Chr. Kremer. Towards a Sharable Knowledge Base on Recyclable Plastics. In: J. K. McDowell and K. J. Meltsner, Eds., Knowledge-Based Applications in Materials Science and Engineering, pp. 29-42, TMS, 1994.
http://www.dfki.uni-kl.de/~vega/relfun.html (Examples: RTPLAST)

[3] H. Boley. Extended Logic-plus-Functional Programming. In: L.-H. Eriksson, L. Hallnäs, and P. Schroeder-Heister, Eds., Proc. of the 2nd International Workshop on Extensions of Logic Programming - ELP '91, Springer LNAI 596, 1992.

[4] F. Bry, R. Manthey, and H. Schütz. Deduktive Datenbanken. In: N. E. Fuchs, Ed., KI - Themenheft Logische Programmierung, 3/96, Interdata Verlags GmbH, Sept. 1996.

[5] M. Bugliesi, E. Lamma, and P. Mello. Modularity in Logic Programming. Journal of Logic Programming, 19/20, pp. 443-502, 1994.

[6] D. Cabeza and M. Hermenegildo. html.pl: A Simple HTML Package for Prolog and CLP Systems - Description and User's Manual (Version 96.1.1). Computer Science Department, Technical University of Madrid (UPM), March 1996.

[7] Th. R. Gruber and G. R. Olsen. An Ontology for Engineering Mathematics. In: J. Doyle, P. Torasso, and E. Sandewall, Eds., Fourth International Conference on Principles of Knowledge Representation and Reasoning, Gustav Stresemann Institut, Bonn, Germany, Morgan Kaufmann, 1994.
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/index.html
(Reference Documents: The Engineering Math Ontologies)

[8] K. Hinkelmann and O. Kühn. Revising and Updating a Corporate Memory. In: Proc. of the European Symposium on Validation and Verification of Knowledge-based Systems - EUROVAV-95, June 1995.

[9] R. A. Kowalski. Logic Programming in Artificial Intelligence. 12th International Joint Conference on Artificial Intelligence - IJCAI'91, pp. 596-603, 1991.

[10] S. W. Loke and A. Davison. Logic Programming with the World-Wide Web, Proc. of the 7th ACM Conference on Hypertext - Hypertext'96, ACM Press, March 1996.

[11] S. W. Loke, A. Davison, and L. Sterling. Lightweight Deductive Databases on the World-Wide Web, P. Tarau, A. Davison, K. De Bosschere, and M. Hermenegildo, Eds., Proc. of the 1st Workshop on Logic Programming Tools for INTERNET Applications, JICSLP'96, Bonn, Sept. 1996.

[12] E. Lusk, D. H. D. Warren, S. Haridi, et al. The Aurora Or-Parallel Prolog System. In: International Conference on Fifth Generation Computer Systems 1988, ICOT, Tokyo, Japan, Nov. 1988.

[13] R. D. J. Post and P. M. E. De Bra. Information Retrieval in the World Wide Web: Making Client-based Searching Feasible. In: Proc. of the First International WWW Conference, Geneva, May 1994.

Footnotes

...encouraged.: Many thanks go to the following people: Michael Sintek helped a lot with both form and content of this paper; remarks by Andreas Abecker, Ansgar Bernardi, Thomas Kieninger, Otto Kühn, Norbert Fuchs, and others contributed to further improvements; the reviewers of the 1st Workshop on Logic Programming Tools for INTERNET Applications motivated the final twist; Jane Bensch helped with alternating German and English drafts.
...standard?: Using it, rule knowledge can be represented almost natural language-like, in the form C if P1 and ... Pn, where, however, the conclusion C and the premises Pi have the form of atoms, i.e. they are parenthesized predicate applications to constants, variables, or structures [9]. Printed Horn-logic knowledge uses quite varied syntaxes: Formal tradition and compactness for example support a left arrow as the if in rules, ASCII-usability, in contrast, supports Prolog's clause notation; on the other hand mathematics and logic would go against Prolog's capitalized variables. Structures could be written with list-like square brackets to distinguish them from atoms, which would also make functional-logic integration easier [3]. Due to the Web a uniform syntax is becoming more pressing.
...box:: Other non-deductive uses of such knowledge bases are also useful, e.g. through machine-learning algorithms on the basis of inductive logic programming [1].
...programming.: Unlike in the more often studied administration of arbitrary Web pages through (extra)logic programs (e.g. [6], [10]), Web extensions for (Horn)logic programs with arbitrary implementation languages are considered here. Both LP-WWW connections could in principle be examined simultaneously.

| |

Knowledge Bases in the World Wide Web: A Challenge for Logic Programming