background="/images/LogAlg_bg.gif" TEXT="000000"
FONTSIZE="14pt" FONT="helvetica"
Computational Logic 
Abstract Interpretation of Logic Programs 
[Material partly from Cousot, Nielson, Gallagher,
Sondergaard, Bruynooghe, and others]
 Many CS problems related to program analysis / synthesis
 Prove that some property holds for program
(program analysis)
 Alternatively: derive properties which do hold for program
(program analysis)
 Given a program , generate a program which is
 in some way equivalent to
 behaves better than w.r.t. some criteria
(program analysis / synthesis)
 Standard Approach:
 identify that some invariant holds, and
 specialize the program for the particular case
 Frequent in compilers although seldom treated in a formal way:
 ``code optimization'',
 ``dead code elimination'',
 ``code motion'',
 ...
[Aho, Ullman 77]
 Often referred to as ``dataflow analysis''
 Abstract interpretation provides a formal framework
for developing program analysis tools
 Analysis phase + synthesis phase
Abstract Interpretation + Program Transformation
 Consider detecting that one branch will not be taken in:
if
then else
 Exhaustive analysis in the standard domain: nontermination
 Human reasoning about programs  uses abstractions or
approximations:
signs, order of magnitude, odd/even, ...
 Basic Idea: use approximate (generally finite)
representations of computational objects to make the problem of
program dataflow analysis tractable
 Abstract interpretation is a formalization of this idea:
 define a nonstandard semantics which can approximate the meaning
or behaviour of the program in a finite way
 expressions are computed over an approximate (abstract) domain
rather than the concrete domain (i.e., meaning of operators has to
be reconsidered w.r.t. this new domain)
 Very general:
can be applied to any language with well defined
(procedural or declarative) semantics
 Automatic  (vs. proof methods)
 Static  not all possible runs actually tried (vs. model checking)
 Sound  no possible run omitted (vs. debugging)
 Consider the domain (integers)
 and the multiplication operator:
 We define an ``abstract domain'':
 Abstract multiplication:
defined by
 This allows us to reason, for example, that
is
never negative
 Some observations:
 Again, (integers)
 and:
 Let's define a more refined ``abstract domain'':
 Abstract multiplication:
defined by
 This now allows us to reason that
is zero
 Some observations:
 There is a degree of freedom in defining different abstract
operators and domains
 The minimal requirement is that they be ``safe'' or ``correct''
 Different ``safe'' definitions result in different kinds of analyses
 Again (integers)
 and the addition operator:
 We cannot use
because we wouldn't
know how to represent the result of
(i.e. our abstract addition would not be closed)
 New element ``'' (supremum): approximation of any integer
 New ``abstract domain'':
 Abstract addition:
defined by:
... (alt:
)
 We can now reason that is never negative
 In addition to the imprecision due to the
coarseness of , the abstract versions of the
operations (dependent on may
introduce further imprecision
 Thus, the choice of abstract domain and the definition of
the abstract operators are crucial
 Required:
 Correctness  safe approximations: because most ``interesting''
properties are undecidable the analysis necessarily has to be
approximate. We want to ensure that the analysis is ``conservative''
and errs on the ``safe side''
 Termination  compilation should definitely terminate
(note: not always the case in every day program analysis tools!)
 Desirable  ``practicality'':
 Efficiency  in practice finite analysis time is not enough:
finite and small
 Accuracy  of the collected information: depends on the
appropriateness of the abstract domain and the level of detail to
which the interpretation procedure mimics the semantics of the
language
 ``Usefulness''  determines which information is worth
collecting
 The first two received the most attention initially
(understandably)
 Last three recently studied empirically
(e.g., for logic programs)
 Basic idea in approximation:
for some property we want to show that
Alternative: construct a set
, and prove
then, is a safe approximation of
 Approximation on functions:
for some property we want to show that
 A function
is a safe approximation of if
 Let the meaning of a program be a mapping from input
to output, input and output values ``standard'' domain :
 Let's `lift' this meaning to map sets of inputs to sets of
outputs
where denotes the powerset of S, and
 A function
is a safe approximation of if
 Properties can be proved using instead of
 For some property we want to show that
for some inputs
,
 We show that
for some inputs
,
 Since
for some inputs
,
(Note: abuse of notation  does not work on abstract values )
 As long as is monotonic:
 And since
, then:
for some inputs
,
 We can now define an abstract meaning function as
which is then safe if
 We can then prove a property of the output of a given class of
inputs represented by by proving that all elements of
have such property
 E.g. in our example, a property such as ``if this program takes
a positive number it will produce a negative number as output''
can be proved
 Generating :
 ``If this program takes a positive number it will produce a
negative number as output''
 ``Inputoutput'' semantics often too coarse for useful analysis:
information about ``state'' at program points generally
required ``extended semantics''
 Program points can be reached many times, from different points,
and in different ``states''
``collecting'' (``sticky'') semantics
 Analysis often computes a collection of abstract states for a
program point
 Often more efficient to ``summarize'' states into one which
gives the best overall description lattice structure
in abstract domain
 The ordering on , , induces an
ordering on , (``approximates better'')
E.g., we can choose either
or
,
but
and
, and
since
we have
,
i.e., approximates better than ,
it is more precise
 It is generally required that
be a
complete lattice
 Therefore, for all
there exists a unique
least upper bound
i.e., such that
 Intuition: given a set of approximations of the ``current
state'' at a given point in a program, to ensure that it is the best
``overall'' description for the point:
 approximates everything the elements of approximate
 is the best approximation in
 We consider
 We add (infimum) so that
exists and to have a complete lattice:
 (Intuition:
it represents a program point that is never reached)
 The concretization function has to be extended with
 The lattice is then given by:

 To make more meaningful we
consider
 The lattice is then given by:
?

accurately represents a program point where a variable can be
negative or zero
 Showing monotonicity of may be more difficult than
showing that meets the finiteness conditions
 There may be an which terminates even if the
conditions are not met
 Conditions also be relaxed by restricting the class of programs
(e.g. nonrecursive programs pose few difficulties, although they
are hardly interesting)
 In some cases an approximation from above (
) can
also be interesting
 There are other alternatives to finiteness: dynamic bounded
depth, etc.
(See: Widening and Narrowing)
 The idea itself (i.e. rule of signs) predates computation...
 The idea of computing by approximations was used as early as
1963 by Naur
(``pseudo evaluation'', in the Gier Algol compiler),
``a process which combines the operators and operands of the
source text in the manner in which an actual evaluation would have to
do it, but which operates on descriptions of the operands, not on
their values''
 1972, Sintzoff (proving wellformedness and termination properties)
 1975, Wegbreit appears to be the first to develop a latticetheoretic
model
 Mid 70's: Kam, Kindall, Tarjan, Ullman, ...
 1976,77, Patrick and Radhia Cousot proposed a formal model for the
analysis of imperative (``flowchart'') languages: unifying framework
 Define a ``static'' semantics: associate a set of possible
storage states with each program point
 Dataflow analysis constructed then as a finitely computable
approximation to the static semantics
 Which semantics?
 Declarative semantics: concerned with what is a consequence of
the program
 Modeltheoretic semantics
 Fixpoint ( operatorbased) semantics
can be what the program actually does (cf. databasestyle bottomup
evaluation)
 Operational semantics: close to the behavior of the program
 SLDresolution based (success sets)
 Denotational
 Can cover possibilities other that SLD: reactive, parallel, ...
 Analyses based on declarative semantics are often called
``bottom up'' analyses
 Analysis based on the (topdown) operational semantics are often
called ``topdown'' analyses
 Also, intermediate cases (generally achieved through program
transformation)
 Example:
}
all subsets of
Such ``bottomup'' analyses have been proposed for example by
Marriott and Sondergaard, and, more recently, by Codish, Dams, and
Yardeni, Debray and Ramakrishnan, Barbuti, Giacobazzi, and Levi, and
others.
 Advantages:
 Simple and elegant. Based on the declarative, fixpoint
semantics
 General: results independent of the query form
 Disadvantages:
 Information only about ``procedure exit.'' Normally information
needed at various program points in compilation, e.g., ``call
patterns'' (closures)
 The ``logical variable'' not observed (uses ground data).
Information on instantiation state, substitutions, etc. often
needed in compilation
 Not querydirected: analyzes whole program, not the part (and
modes) that correspond to ``normal'' use (expressed through a query
form)
 Solutions:
 Call patterns obtainable via ``magic sets'' transformation
[Marriott and Sondergaard]
Used also for querydirected analysis by [Barbuti et al.], [Codish et al.],
[Gallagher et al.], [Ramakrishnan et al.], and others
 Enhanced fixpoint semantics
(e.g, Ssemantics [Falaschi et al.], [Gaifman and Shapiro])
 Define an extended (collecting) concrete semantics, derived from
SLD resolution,
making relevant information observable.
 Abstract domain: generally ``abstract substitutions''.
 Abstract operations: unification, composition,
projection, extension, ...
 Abstract semantic function: takes a query form
(abstraction of initial goal or set of initial goals) and the
program and returns abstract descriptions of the substitutions at
relevant program points.
 Variables complicate things:
 correctness (due to aliasing),
 termination (merging information related to different
renamings of a variable)
 Logic variables are in fact (well behaved) pointers:
X = tree(N,L,R), L = nil, Y = N, Y = 3, ...
this makes analysis of logic programs very interesting
(and quite relevant to other paradigms).
 Simple domains [Mellish,Debray], e.g.:
{ closed (ground), don't know, empty, free,
nonvar }
(e.g. , ?, , , )
 May need to be very imprecise to be correct:
: entry p(X,Y) : ( free(X), free(Y) ).
p(X,Y) :
q(X,Y),
X = a.
q(Z,Z).
 Correct/more accurate treatment of aliasing [Debray]:
associate with a program variable a pair
abstraction of the set of
terms the variable may be bound to
set of program variables
it may ``share'' with .
 More accurate sharing  pair sharing [Sondergaard] [Codish]:
pairs of variables denoting possible sharing.
: entry p(X,Y) : ( free(X), free(Y) ).
p(X,Y) :
q(X,Y), % { X=f, Y=f } and { (X,Y) }
X = a. % { X=g, Y=g } and { (X,Y) }
q(Z,Z).
 Note: we have used a ``combined'' domain: simple modes plus pair
sharing
 Pair sharing can encode linearity:
: entry p(X,Y) : ( free(X), free(Y) ).
p(X,Y) :
q(X,Y), % { X=f, Y=f } and { (X,Y) }
W = f(X,Y). % { W=nv, X=f, Y=f } and { (W,W), (X,Y) }
q(Z,Z).
 Even more accurate sharing  set sharing [Jacobs et al.]
[Muthukumar et al.]:
sets of sets of variables.
 A bit tricky to understand. Try:
 Encodes grounding and independence
 has no ocurrence in any set: it is ground
 has no ocurrence in any set: they are independent
 Sharing+Freeness [Muthukumar et al.] (and + depthK)
 Type graphs [Janssens et al.]
 DepthK [Sato and Tamaki]
 Pattern structure [Van Hentenryck et al.]
 Variable dereferencing [VanRoy] [Taylor]
 ...
 Much work by [Codish et al.] [File et al.] [Giacobazzi et al.]
... on combining and comparing these domains
 Debray: predicate level mode inference (call and success
patterns for predicates). Unification reformulated as entry + exit
unification. Termination by tabling.
 Jones, Marriott, and Sondergaard: using denotational semantics.
 Bruynooghe:
 Concrete semantics constructs ``generalized'' AND trees: nodes
contain instance of goal before and after execution: call
substitution and success substitution.
 Analysis constructs ``abstract ANDOR trees''. Each represents a
(possibly infinite) set of (possibly infinite) concrete trees.
Widening to regular trees for termination.
 Framework is generic: parametric on some basic domain related
functions + conditions for correctness and termination.
 Muthukumar and Hermenegildo: ``PLAI'' framework.
Improvement over previous frameworks:
Efficient fixpoint algorithms (dependency tracking) and
memory savings (no explicit representation of trees).
 Fixpoint required on recursive predicates only:
figure=/home/clip/Slides/nmsu_lectures/ai/Figs/fixpt.ps,bbllx=0pt,bblly=20pt,bburx=500pt,bbury=220pt,width=0.85
 Simply recursive (a)
 Mutually recursive (b)
``Use current success substitution and iterate until a fixpoint is
reached''
 Abstract tree contains several occurrences of the same atom in a
clause (for precision): useful for program specialization
( Multivariance )
However,
too many versions if not controlled
(solutions proposed [Gianotti
et al.], [Jacobs et al.], [Puebla et al.])
 Much recent work in domains, improvement of fixpoints,
application, etc. [Taylor],[VanRoy], GAIA [LeCharlier et al.]
 Abstract compilation:
Compute over and ``abstract version'' of the program
 Reexecution [Bruynooghe, LeCharlier et. al.]
(alternative to keeping track of accurate sharing)
 Caching of operations [LeCharlier et al.]
 CLP: (relationbased) programs over symbolic and non symbolic
domains: constraint satisfaction instead unification (e.g. CLP(R),
PrologIII, CHIP, etc.)
 Jorgensen, Marriott, and Michaylov [ISLP'91] and later Marriott
and Stuckey [POPL'93] identified numerous opportunities for
improvement via static analysis
 A number of proposals for analysis frameworks:
 Marriott and Sondergaard [NACLP90]:
denotational approach
 Codognet and Filé [ICPL92]:
uses constraint solving for the
analysis itself and ``abstract compilation''
 G. de la Banda and Hermenegildo [WICLP'91,ILPS'93]:
adaptation
of LP frameworks (PLAI).
 A few milestones (on the road to CLP analysis):
 1981, Mycroft: strictness analysis of applicative languages
 1981, Mellish: proposes application to logic programs
 1986, Debray: framework with safe treatment of logic variables,
discussion of efficiency
 1987, Bruynooghe: framework for LP based on andor trees
 1987, Jones and Sondergaard: framework based on a denotational
definition of SLD
 1988, Warren, Debray and Hermenegildo: and
practicality of Abs. Int. for Logic Programs shown (for program
parallelization)
 1989, Muthukumar and Hermenegildo: PLAI generic system
 1990, Van Roy / Taylor: application to sequential optimization
of Prolog
 1991, Marriott et al.: first extension to CLP
 1992, Garcia de la Banda and Hermenegildo: generalization of
Bruynooghe's algorithm to CLP, extension of PLAI
 Abstract Interpretation is a very elegant program analysis
technique
 It has in addition been proved useful and efficient. E.g., for
LP and CLP:
 Static parallelization of logic (and CLP) programs
[Hermenegildo et al]
 (Sequential) program optimization [Taylor, VanRoy, ...]
 Optimization of CLP programs [Marriott et al, ...]
 Abstract debugging, etc.
 Interesting issues studied for handling large real programs:
 Modularity
 Handling extralogical features, higher order
 Handling dynamic code
 Support of testdebug cycle
Solutions include [See, e.g., papers in ESOP'96, SAS'96]:
 Module interface definition: modular analysis
 Analysis of ``Full Prolog''
 Incremental analysis
 Demo!
Last modification: Wed Nov 22 23:57:35 CET 2006 <webmaster@clip.dia.fi.upm.es>