phv
background="/images/LogAlg_bg.gif" TEXT="000000"
FONTSIZE="14pt" FONT="helvetica"
Parallel Execution of Logic Programs
A Tutorial
(Or: Multicores are here! Now, what do we do with them?)
Manuel Hermenegildo
IMDEA Software
Tech. University of Madrid
U. of New Mexico
Compulog/ALP Summer School  Las Cruces, NM, July 2427 2008
The UPM work presented is a joint effort with members of the
CLIP group at the UPM School of Computer Science and IMDEA
Software including: Francisco Bueno, Daniel Cabeza, Manuel
Carro, Amadeo Casas, Pablo Chico, Jesús Correas, María
José García de la Banda, Manuel Hermenegildo, Pedro
López, Mario Méndez, Edison Mera, José Morales Jorge
Navas, and Germán Puebla.
 Multicore chips have moved parallelism from niche (HPC)
to mainstream
even on laptops!
 According to vendors (and Intel in particular [e.g., DAMP workshops]):
 Feature size reductions will continue for foreseeable
future (12 generations!).
 But power consumption does not allow increasing clock speeds much.
 Multicore is the way to use this space without raising power
consumption.
 Number of cores expected to double with each generation!
 But writing parallel programs hard/errorprone how to
exploit all those cores?
 Ideal situation: Conventional Program + Multiprocessor
= Higher Perf.
automatic parallelization.
 More realistically: compileraided parallelization.
 Languages (dialects, constructs) for
parallelization+parallel programming.
 Scheduling techniques
, memory management, abstract machines,
etc.
 $$
 Many parallelismfriendly aspects:
 program close to problem description
less hiding of intrinsic parallelism
 well understood mathematical foundation
simplifies formal treatment
 relative purity (well behaved variable scoping, fewer
sideeffects, generally single assignment)
more amenable to automatic parallelization.
 $$
 At the same time, requires dealing with the most complex
problems
:
 irregular computations; complex data structures; (well behaved)
pointers; dynamic memory management; recursion; ...
but in a much more elegant context;
and brings up some upcoming issues (e.g., speculation, search,
constraints).
 Very good platform for developing universally useful
techniques:
Examples to date: conditional dep. graphs, abstract interpretation
w/interesting domains, cost analysis / gran. control, dynamic
sched. and load balancing, ...
 Orparallelism
:
execute simultaneously different search space branches.
 Present in general search problems, enumeration part of
constr. problems, etc.
money(S,E,N,D,M,O,R,Y) : digit(0).
digit(S), digit(1).
digit(E), ...
..., digit(9).
carry(I),
..., carry(0).
N is E+O10*I, carry(1).
 Andparallelism
:
execute simultaneously different clause body goals.
 Comprises traditional parallelism (parallel loops, divide and
conquer, etc.).
 Concurrent languages also generally based on andparallelism.
qsort([XL],R) :
partition(L,X,L1,L2),
qsort(L2,R2),
qsort(L1,R1),
append(R1,[XR2],R).
 Temptation: make use of all this potential.
 Problem: this can yield a slowdown or even erroneous results.
 Objective
:
and/orparallel execution of (some of the goals in) logic
programs (and full Prolog, CLP, CC, ...),
while:
 obtaining the same solutions as the sequential execution
(i.e., correctness)
 taking a shorter or equal execution time (speedup or, at
least, noslowdown over stateoftheart sequential systems)
(i.e., efficiency).
 Above conditions may not always be met:
 Independence: conditions that the runtime behavior
of the goals must satisfy to guarantee correctness and efficiency
(under ideal conditions  no overhead).
 The presence of overheads complicates things further:
 Granularity Control: techniques for ensuring
efficiency in the presence of overheads.


main : l, s.
: parallel l/0.
l : large_work_a.
l : large_work_b.
: parallel s/0.
s : small_work_a.
s : small_work_b.


 Speculation (e.g., p in example).
 To guarantee speedup: avoid speculative work 
too strong/difficult?
 To guarantee noslowdown:
 Leftbiased scheduling.
 Instantaneous killing on cut.
 Granularity: avoid parallelizing work that is too
small.
 Quite successful systems built (ECLIPSE, SICSTUS/MUSE, Aurora,
OrpYap
, etc.)
 MUSE is quite easy to add to an existing Prolog system
(done with Prolog by BIM, also added to SICStus Prolog V3.0)
 Significant speedups w.r.t. stateoftheart Prolog systems can
be obtained with Aurora and Muse for searchbased applications.
[named]imdeasofter
Program 
1 
2 
4 
8 
10 
Sicstus 0.6 
parse1 
1 
1.8 
2.8 
2.93 
2.76 
1.25 
[named]imdealight
parse5 
1 
1.97 
3.74 
6.92 
7.72 
1.27 
db5 
1 
1.93 
3.74 
6.92 
7.34 
1.37 
[named]imdealight
8queens 
1 
1.99 
3.95 
7.88 
9.6 
1.25 
tina 
1 
2.07 
4.06 
7.81 
9.59 
1.43 
 Much work done on schedulers (left bias, cut, side effects, ....)
 Easy to extend to CLP (e.g., VanHentenryck
,
ECLIPSE system).
 Correctness: ``same'' solutions as sequential
execution.
 Efficiency: execution time than seq. program
(or, at least, noslowdown: ).
(We assume
parallel execution has no overhead in this first stage.)
 Running at ``seeing '':

Imperative 
Functions 
Constraints 

Y := W+2; 
(+ W 2) 
Y = W+2, 

X := Y+Z; 
(+ Z) 
X = Y+Z, 

readwrite deps 
strictness 
cost! 
For Predicates (multiple procedure definitions): 

main : 


p(X), 

q(X), 

write(X). 



p(X) : X=a.
q(X) : X=b, large computation.
q(X) : X=a.


Again, cost issue:
if paffectsq (prunes its
choices) then q ahead of p is speculative. 
 Independence:
condition that guarantees correctness and efficiency.
 Informal notion: a computation ``does not affect'' another
(also referred to as ``stability'' in, e.g., EAM/AKL).
 Greatly clarified when put in terms of Search Space
Preservation (SSP)  shown SSP sufficient and necessary condition for
efficiency
.
 Detection of independence:
 Runtime (apriori
conditions)
.
 Compiletime
.
 Mixed: conditional execution graph
expressions
. (1)
 User control: explicit parallelism (concurrent languages). (2)
 (1)+(2) = &Prolog
: view parallelization as a source to source transformation of original
program into a parallelized (``annotated'') one in a concurrent/parallel language. Allows:
 Automatic parallelization  and understanding the result).
 User parallelization  and the compiler checking it).
 For concreteness, hereafter we use &Prolog (now Ciao) as a
target.
The relevant minimal subset of &Prolog/Ciao:
 Prolog (with ifthenelse, etc.).
 Parallel conjunction ``&/2''
(with correct and complete forwards and backwards semantics).
 A number of primitives for runtime testing of instantiation state.
 Ciao
is one of the popular Prolog/CLP systems (supports
ISOProlog fully).
Many other features: newgeneration multiparadigm
language/prog.env. with:
 Predicates, constraints, functions (including lazyness),
higherorder, ...
 Assertion language for expressing rich program
properties
(types, shapes, pointer aliasing, nonfailure,
determinacy, data sizes, cost, ...).
Static debugging, verification, program certification,
PCC, ...
 Parallel, concurrent, and distributed execution primitives.
 Automatic parallelization.
 Automatic granularity and resource control.
 Approach (goal level). Consider parallelizing p(X,Y) and
q(X,Z):
main :
t(X,Y,Z),
p(X,Y),
q(X,Z).
We compare the behaviour of q(X,Z) and q(X,Z).
 Apriori Independence: when reasoning only about
.
Can be checked at runtime before execution of the goals.
 A priori independence in the Herbrand domain:
Strict
Independence
: goals do
not share variables at runtime.
 Example 1: Above, if t(X,Y,Z) : X=a.
 The ``pointers'' view:
correctness and efficiency (search space
preservation) guaranteed for p & q if
there are no ``pointers'' between p and q.
main : X=f(K,g(K)), Y=a,
Z=g(L), W=h(b,L),
>
p(X,Y),
q(Y,Z),
r(W).


figure=Figs/strict_ind.eps,width= 

p and q are strictly independent, but q and r
are not.
 Example 2:
qs([XL],R) : part(L,X,L1,L2),
qs(L2,R2), qs(L1,R1),
app(R1,[XR2],R).
Might be annotated in &Prolog (or Ciao) as:
qs([XL],R) :
part(L,X,L1,L2),
( indep(L1,L2) > qs(L2,R2) & qs(L1,R1)
; qs(L2,R2) , qs(L1,R1) ),
app(R1,[XR2],R).
 Not always possible to determine locally/statically:
main : t(X,Y), p(X), q(Y).
main : read([X,Y]), p(X), q(Y).
 Alternatives: runtime independence tests, global analysis, ...
 Can we build a system which obtains speedups w.r.t. a state of
the art sequential LP system using such annotations?
 Can those annotations be generated automatically?
 Issues in direct implementation:
 Scheduling / fast task startup.
 Memory management.
 Use of analysis information to improve indexing.
 Local environment support.
 Recomputation vs. copying.
 Efficient implementation of parallel backtracking
(and opportunities for intelligent backtracking).
 Efficient implementation of ``ask'' (for communication among
threads).
 etc.
 Evolution of the RAPWAM (the first Multisequential Model?) and
Sicstus WAM.
 Defined as a storage model + an instruction set

PWAM Storage Model: A Stack Set

 Agents separate from Stack Sets;
Dynamic creation/deletion of
S.Sets/Agents
 Lazy, on demand scheduling
 Extensions / optimizations:
 DASWAM / DDAS System (dependent and//)
 &ACE, ACE Systems (or, and,
dep//)
figure=/home/clip/Papers/balimp/Graphs/pann.ps,width=3.5cm
figure=/home/clip/Papers/balimp/Graphs/pfib.ps,width=3.5cm
figure=/home/clip/Papers/balimp/Graphs/pfibgran.ps,width=3.5cm
figure=/home/clip/Papers/balimp/Graphs/pboyernsi.ps,width=3.5cm
figure=/home/clip/Papers/balimp/Graphs/phanoi.ps,width=3.5cm
figure=/home/clip/Papers/balimp/Graphs/pmatfp.ps,width=3.5cm
figure=/home/clip/Papers/balimp/Graphs/pmatrix.ps,width=3.5cm
figure=/home/clip/Papers/balimp/Graphs/poccur.ps,width=3.5cm
figure=/home/clip/Papers/balimp/Graphs/porsim.ps,width=3.5cm
figure=/home/clip/Papers/balimp/Graphs/pqs.ps,width=3.5cm
figure=/home/clip/Papers/balimp/Graphs/pqsdlnsi.ps,width=3.5cm
figure=/home/clip/Papers/balimp/Graphs/premdisj.ps,width=3.5cm
Sequent Symmetry, hand parallelized programs.
(Speedup over state of the art sequential systems.)
figure=/home/clip/Slides/nmsu_lectures/par/Figs/qsort.1.c.ps,height=0.92,width=0.92
(VisAndOr
output.)
figure=/home/clip/Slides/nmsu_lectures/par/Figs/qsort.4.cr.ps,height=0.92,width=0.92
(VisAndOr
output.)
 Not always possible to determine locally/statically:
main : t(X,Y), p(X), q(Y).
main : read([X,Y]), p(X), q(Y).
 Alternatives: runtime independence tests, global analysis, ...
main : read([X,Y]), ( indep(X,Y) > p(X) & q(Y)
; p(X) , q(Y) ).
main : t(X,Y), p(X) & q(Y). %% (After analysis)
 Conditional Dependency Graph
(of some code segment)
:
 Vertices: possible tasks (statements, calls, bidings, etc.).
 Edges: possible dependencies (labels: conditions needed for
independence).
 Local or global analysis used to reduce/remove checks in the edges.
 Annotation process converts graph back to parallel expressions in source.
foo(...) :
g(...),
g(...),
g(...).



[clip=true,angle=270,trim= 100 0 0 0,
totalheight=0.85]/home/clip/Slides/nmsu_lectures/par/Figs/process_lp
figure=/home/clip/Slides/nmsu_lectures/par/Figs/apcompiler.ps,width=0.7
Parallelizing compiler
(now integrated in
CiaoPP
):
 Global Analysis: infers independence information.
 Annotator(s): Prolog &Prolog parallelization
.
 Lowlevel PWAM compiler: extension of Sicstus V0.5
 Granularity Analysis: determines task size or size
functions
.
 Granularity
Control:
restricts parallelism based on
task sizes
.
 Other modules: side effect analyzer (sequencing of
sideeffects, coded in &Prolog), multiple specializer /
partial evaluator, invariant eliminator, etc.
multiply([],_,[]).
multiply([V0V0s],V1,[VrVrs]) :
vmul(V0,V1,Vr),
multiply(V0s,V1,Vrs).
vmul([],[],0).
vmul([H1T1],[H2T2],Vr) :
scalar_mult(H1,H2,H1xH2),
vmul(T1,T2,T1xT2),
Vr is H1xH2+T1xT2.
scalar_mult(H1,H2,H1xH2) : H1xH2 is H1*H2.
Source (Prolog)
multiply([],_,[]).
multiply([V0V0s],V1,[VrVrs]) :
( ground([V1]), indep([[V0,V0s],[V0,Vrs],[V0s,Vr],[Vr,Vrs]])
> vmul(V0,V1,Vr) & multiply(V0s,V1,Vrs)
; vmul(V0,V1,Vr), multiply(V0s,V1,Vrs) ).
vmul([],[],0).
vmul([H1T1],[H2T2],Vr) :
( indep([[H1,T1],[H1,T2],[T1,H2],[H2,T2]])
> scalar_mult(H1,H2,H1xH2) & vmul(T1,T2,T1xT2)
; scalar_mult(H1,H2,H1xH2), vmul(T1,T2,T1xT2) ),
Vr is H1xH2+T1xT2.
scalar_mult(H1,H2,H1xH2) : H1xH2 is H1*H2.
Parallelized program (&Prolog/Ciao)no global analysis
: entry multiply(g,g,f).
multiply([],_,[]).
multiply([V0V0s],V1,[VrVrs]) : % [[Vr],[Vr,Vrs],[Vrs]]
multiply(V0s,V1,Vrs), % [[Vr]]
vmul(V0,V1,Vr). % []
vmul([],[],0).
vmul([H1T1],[H2T2],Vr) : % [[Vr],[H1xH2],[T1xT2]]
scalar_mult(H1,H2,H1xH2), % [[Vr],[T1xT2]]
vmul(T1,T2,T1xT2), % [[Vr]]
Vr is H1xH2+T1xT2. % []
scalar_mult(H1,H2,H1xH2) : % [[H1xH2]]
H1xH2 is H1*H2. % []
Sharing information inferred by the analyzer
multiply([],_,[]).
multiply([V0V0s],V1,[VrVrs]) :
( indep([[Vr,Vrs]]) >
multiply(V0s,V1,Vrs) &
vmul(V0,V1,Vr)
;
multiply(V0s,V1,Vrs),
vmul(V0,V1,Vr) ).
vmul([],[],0).
vmul([H1T1],[H2T2],Vr) :
scalar_mult(H1,H2,H1xH2) &
vmul(T1,T2,T1xT2),
Vr is H1xH2+T1xT2.
scalar_mult(H1,H2,H1xH2) : H1xH2 is H1*H2.
...and the parallelized program with this information.
 Allows detecting failure of groundness checks.
 Increases accuracy of sharing information.
 Abstract Domain:
 Abstraction (freeness) of a substitution:
 Example:
.
, where
 Two components: sharing & freeness
 The freeness information restricts the possible combinations of
sharing patterns.
 Pictorial representation:
2.0pt
p(X,Y,Z) 
[l]


[l]
X = f(Y)
Z = b 
p(X,Y,Z) 
[l]


[l]
X = f(A)
Y = f(A) 
: entry multiply(g,g,f).
multiply([],_,[]).
multiply([V0V0s],V1,[VrVrs]) : % [[Vr],[Vrs]],[Vr,Vrs]
multiply(V0s,V1,Vrs), % [[Vr]],[Vr]
vmul(V0,V1,Vr). % [],[]
vmul([],[],0).
vmul([H1T1],[H2T2],Vr) : % [[Vr],[H1xH2],[T1xT2]],
% [Vr,H1xH2,T1xT2]
scalar_mult(H1,H2,H1xH2), % [[Vr],[T1xT2]],[Vr,T1xT2]
vmul(T1,T2,T1xT2), % [[Vr]],[Vr]
Vr is H1xH2+T1xT2. % [],[]
scalar_mult(H1,H2,H1xH2) : % [[H1xH2]],[H1xH2]
H1xH2 is H1*H2. % [],[]
Sharing+Freeness information inferred by the analyzer
multiply([],_,[]).
multiply([V0V0s],V1,[VrVrs]) :
multiply(V0s,V1,Vrs) &
vmul(V0,V1,Vr).
vmul([],[],0).
vmul([H1T1],[H2T2],Vr) :
scalar_mult(H1,H2,H1xH2) &
vmul(T1,T2,T1xT2),
Vr is H1xH2+T1xT2.
scalar_mult(H1,H2,H1xH2) : H1xH2 is H1*H2.
...and the parallelized program with this information.

Average time in seconds 
Program 
Prol. 
S 
P 
SF 
P*S 
P*SF 
aiakl 
0.17 
0.20 
0.43 
0.22 
0.32 
0.37 
ann 
1.76 
19.40 
5.54 
10.50 
16.37 
17.68 
bid 
0.46 
0.32 
0.27 
0.36 
0.46 
0.56 
boyer 
1.12 
3.56 
1.38 
4.17 
2.91 
3.65 
browse 
0.38 
0.13 
0.17 
0.15 
0.21 
0.24 
deriv 
0.21 
0.06 
0.05 
0.07 
0.09 
0.11 
fib 
0.03 
0.01 
0.01 
0.02 
0.02 
0.02 
hanoiapp 
0.11 
0.03 
0.03 
0.04 
0.06 
0.07 
mmatrix 
0.07 
0.03 
0.03 
0.03 
0.04 
0.05 
occur 
0.34 
0.04 
0.03 
0.05 
0.06 
0.07 
peephole 
1.36 
5.45 
2.54 
3.94 
7.00 
7.45 
qplan 
1.68 
1.54 
11.52 
1.84 
2.60 
3.36 
qsortapp 
0.08 
0.04 
0.05 
0.05 
0.08 
0.09 
read 
1.07 
2.09 
1.89 
2.35 
2.99 
3.51 
serialize 
0.20 
2.26 
0.23 
0.62 
0.52 
0.67 
tak 
0.04 
0.02 
0.02 
0.02 
0.02 
0.04 
warplan 
0.80 
15.71 
5.02 
8.71 
15.74 
17.68 
witt 
1.86 
1.98 
16.24 
2.26 
2.87 
3.42 
Prol. 
Standard Prolog compiler time 
S 
(Set) Sharing 
P 
Pair sharing (Sondergaard) 
SF 
Sharing + Freeness 
X*Y 
Combinations 






























(110 processors actual speedups on Sequent Symmetry; 10+
projections using IDRA simulator on execution traces)
figure=/home/clip/Papers/effofabs/Figs/mmatrix.ps,bbllx=55pt,bblly=22pt,bburx=400pt,bbury=452pt,width=0.44
figure=/home/clip/Papers/effofabs/Figs/ann.ps,bbllx=55pt,bblly=45pt,bburx=400pt,bbury=500pt,width=0.43
Simple matrix mul. ( simulated)
The parallelizer, selfparallelized
 Pure goals: only one thread ``touches'' each shared variable. Example:
main : t(X,Y), p(X), q(Y).
t(X,Y) : Y = f(X).
p is independent of t (but p and q are dependent).
 Impure goals: only rightmost ``touches'' each shared variable. Example:
main : t(X,Y), p(X), q(Y).
t(X,Y) : Y = a. p(X) : var(X), ..., X=b, ...
 More parallelism.
 But cannot be detected ``apriori:'' requires global analysis.
 Very important in programs using ``incomplete structures.''
flatten(Xs,Ys) : flatten(Xs,Ys,[]).
flatten([], Xs, Xs).
flatten([XXs],Ys,Zs) : flatten(X,Ys,Ys1), flatten(Xs,Ys1,Zs).
flatten(X, [XXs], Xs) : atomic(X), X \== [].
 Another example:
qsort([],S,S).
qsort([XXs],S,S2) :
partition(Xs,X,L,R),
qsort(L,S,[XS1]),
qsort(R,S1,S2).
 We consider the parallelization of pairs of goals.
 Let the situation be:
.
We define:
 Conditions for nonstrict independence for p and q:
 C1: preserves freeness of shared variables.
 C2: preserves independence of shared variables.
 More relaxed conditions if information re. partial answers and
purity of goals.
 Runtime checks can be automatically included to ensure NSI
when the previous conditions do not hold.
 The method uses analysis information.
 Possible checks are:
 ground(X): X is ground.
 allvars(X,): every free variable in X
is in the list .
 indep(X,Y): X and Y do not share variables.
 sharedvars(X,Y,): every free variable shared by X
and Y is in the list .
 The method generalizes the techniques previously proposed for
detection of SI.
 Even when only SI is present, the tests generated may be better
than the traditional tests.
Speedups of five programs that have NSI but no SI:
 array2list translates an extendible array into a list of
indexelement pairs.
 flatten flattens a list of lists of any complexity into a plain
list.
 hanoi_dl solves the towers of Hanoi problem using difference
lists.
 qsort is the sorting algorithm quicksort using difference lists.
 sparse transforms a binary matrix into an optimized notation for
sparse matrices.

# of processors 
P 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
1 
0.78 
1.54 
2.34 
3.09 
3.82 
4.64 
5.41 
5.90 
6.50 
7.22 
2 
0.54 
1.07 
1.61 
2.07 
2.52 
3.05 
3.62 
4.14 
4.46 
4.83 
3 
0.56 
1.13 
1.68 
2.25 
2.73 
3.23 
3.70 
4.34 
4.84 
5.25 
4 
0.91 
1.65 
2.20 
2.53 
2.75 
2.86 
3.00 
3.14 
3.30 
3.33 
5 
0.99 
1.92 
2.79 
3.68 
4.50 
5.06 
5.78 
6.75 
8.10 
8.26 
 Parallel expressions:
Bench. 
Total CGEs 
Uncond. CGEs 
Program 
Def 
Free 
FD 
Def 
Free 
FD 
amp 
5 
 
5 
0 
 
0 
bridge 
0 
 
0 
0 
 
0 
circuit 
3 
2 
2 
0 
0 
0 
dnf 
14 
14 
14 
12 
0 
12 
laplace 
1 
 
1 
1 
 
1 
mining 
5 
4 
4 
1 
0 
2 
mmatrix 
2 
2 
2 
0 
0 
0 
mg_extend 
0 
0 
0 
0 
0 
0 
num 
16 
16 
16 
5 
10 
10 
pic 
4 
3 
3 
0 
0 
0 
power 
5 
5 
5 
1 
1 
1 
runge_kutta 
2 
1 
1 
0 
0 
0 
trapezoid 
1 
1 
1 
0 
0 
0 
 Conditional checks:
Bench. 
Conditions: def/unlinked 
Program 
Def 
Free 
FD 
amp 
1/10 
 
1/10 
bridge 
0/0 
 
0/0 
circuit 
1/5 
0/10 
0/3 
dnf 
0/2 
0/30 
0/2 
laplace 
0/0 
 
0/0 
mining 
3/5 
5/5 
2/4 
mmatrix 
0/2 
2/8 
0/2 
mg_extend 
0/0 
0/0 
0/0 
num 
0/24 
0/20 
0/19 
pic 
2/9 
6/8 
1/3 
power 
3/40 
3/29 
3/29 
runge_kutta 
5/0 
6/0 
3/0 
trapezoid 
0/9 
0/9 
0/9 
figure=/home/clip/Slides/nmsu_lectures/par/Figs/mmat.ps,width=0.9
Speedups for mmatrix

figure=/home/clip/Slides/nmsu_lectures/par/Figs/crit_go2.ps,width=0.9
Speedups for critical with go2 input

figure=/home/clip/Slides/nmsu_lectures/par/Figs/crit_go3.ps,width=0.6
Speedups for critical with go3 input

 Tests on LP programs:
 Analysis:
compares well to LPspecific domains, but worse relative precision
(except Def x Free).
 Annotation:
 Efficiency shows the relative precision of the information.
 Effectiveness comparable for Def x Free. Def
and Free alone less precise.
 Tests on CLP programs:
 Analysis:
acceptable, but comparatively more expensive than for LP.
 Annotation:
 Efficiency in the same ratio to analysis as for LP.
 Effectiveness: Def x Free comparably more effective
that Def and Free alone. But still less satisfactory
than for LP.
 Key: none are specific purpose domains.
 Still, useful speedups.
 Generalization for LP/CLP with dynamic
scheduling and CC [G.Banda Ph.D.].
 Computations can be speculative (or even nonterminating!):
foo(X) : X=b, , p(X) & q(X),
foo(X) : X=a,
p(X) : ..., X=a, ...
q(X) : large computation.

but ``no slowdown'' guaranteed if
 leftbiased scheduling,
 instantaneous killing of siblings (failure propagation).
 Left biased schedulers, dynamic throttling of speculative
tasks,nonfailure, etc.
.
 Static detection of
nonfailure
:
avoids speculativeness / guarantees theoretical
speedup.
importance of nonfailure analysis.
 Independence not enough:
overheads (task creation and scheduling, communication, etc.)
 In CLP compounded by the fact that the number and size of tasks is
highly irregular and dependent on runtime parameters.
 Dynamic solutions:
 Minimize task management and data communication overheads
(micro tasks, shared heaps, compiletime elimination of locks, ...)
 Efficient dynamic task allocation
(e.g., noncentralized task stealing)
 Quite good results for sharedmemory multiprocessors
early on
(e.g., Sequent Balance 198689).
 Not sufficient for clusters or over a network.
 Replace parallel execution with sequential execution (or
viceversa) based on bounds (or estimations) on task size and
overheads.
 Cannot be done completely at compiletime:
cost often depends on input (hard to approximate at compile time,
even w/abstract interpretation).
main : read(X), read(Z), inc_all(X,Y) & r(Z,M), ...
inc_all([]) := [].
inc_all([IIs]) := [ I+1  ~inc_all(Is) ].
 Our approach:
 Derive at compiletime cost functions (to be evaluated
at runtime) that efficiently bound task size (lower, upper
bounds).
 Transform programs to carry out runtime granularity control.
figure=Figs/par_process_gran_sm.ps,width=0.8
 For the previous example:
main : read(X), read(Z), inc_all(X,Y) & r(Z,M), ...
inc_all([]) := [].
inc_all([IIs]) := [ I+1  ~inc_all(Is) ].
 Assume X determined to be input, Y output,
cost function inferred , threshold 100 units:
main : read(X), read(Z), (2*length(X)+1 > 100 > inc_all(X,Y) & r(Z,M)
; inc_all(X,Y) , r(Z,M)), ...
 Provably correct techniques (thanks to abstract interpretation):
can ensure speedup if assumptions hold.
 Issues: derivation of data measures, data size functions, task cost
functions, program transformations, optimizations...
 Perform type/mode inference:
: true inc_all(X,Y) : list(X,int), var(Y) => list(Y,int).
 Infer size measures: list length.
 Use data dependency graphs to determine the relative sizes of
structures that variables point to at different program points 
infer argument size relations:
(boundary condition from base case),
.
Sol =
.
 Use this, set up recurrence equations for
the computational cost of procedures:
(boundary condition from base case),
.
Sol =
.
 We obtain lower/upper bounds on task granularities.
 Nonfailure (absence of exceptions) analysis needed for lower
bounds.
 Simplification of cost functions:
..., ( length(X) > 50 > inc_all(X,Y) & r(Z,M)
; inc_all(X,Y) , r(Z,M) ), ...
..., ( length_gt(LX,50) > inc_all(X,Y) & r(Z,M)
; inc_all(X,Y) , r(Z,M) ), ...
 Complex thresholds: use also communication cost functions, load,
...
Example: Assume
.
We know (actually, exact size) ; thus:
Guaranteed speedup for any data size!
Sometimes static decisions can be made despite dynamic
sizes and costs (e.g., when ratios are independent of input).
 Static task clustering (loop unrolling / data parallelism):
..., ( has_more_elements_than(X,5) > inc_all_2(X,Y) & r(X)
; inc_all_2(X,Y), r(X) ), ...
inc_all([X1,X2,X3,X4,X5R) := [X1+1,X2+1,X3+1,X4+1,X5+1  ~inc_all(R)].
inc_all([]) := [].
(actually, cases for 4, 3, 2, and 1 elements also have to be
included); this is also useful to achieve fast task startup
.
 Sometimes static decisions can be made despite dynamic sizes and
costs (e.g., when the ratios are independent of input).
 Data size computations can often be done onthefly.
 Static placement.
g_qsort([], []).
g_qsort([FirstL1], L2) :
partition3o4o(First, L1, Ls, Lg, Size_Ls, Size_Lg),
Size_Ls > 20 > (Size_Lg > 20 > g_qsort(Ls, Ls2) & g_qsort(Lg, Lg2)
; g_qsort(Ls, Ls2), s_qsort(Lg, Lg2))
; (Size_Lg > 20 > s_qsort(Ls, Ls2), g_qsort(Lg, Lg2)
; s_qsort(Ls, Ls2), s_qsort(Lg, Lg2))),
append(Ls2, [FirstLg2], L2).
partition3o4o(F, [], [], [], 0, 0).
partition3o4o(F, [XY], [XY1], Y2, SL, SG) :
X =< F, partition3o4o(F, Y, Y1, Y2, SL1, SG), SL is SL1 + 1.
partition3o4o(F, [XY], Y1, [XY2], SL, SG) :
X > F, partition3o4o(F, Y, Y1, Y2, SL, SG1), SG is SG1 + 1.
 Shared memory:
.
[named]imdeasofter
programs 
seq. prog. 
no gran.ctl 
gran.ctl 
gc.stopping 
gc.argsize 

fib(19) 
1.839 
0.729 
1.169 
0.819 
0.549 



1 
60% 
12% 
+24% 

[named]imdealight
hanoi(13) 
6.309 
2.509 
2.829 
2.399 
2.399 

[named]imdealight 

1 
12.8% 
+4.4% 
+4.4% 

unbmatrix 
2.099 
1.009 
1.339 
0.870 
0.870 



1 
32.71% 
+13.78% 
+13.78% 

[named]imdealight
qsort(1000) 
3.670 
1.399 
1.790 
1.659 
1.409 

[named]imdealight 

1 
28% 
19% 
0.0% 

 Cluster:
.
[named]imdeasofter
programs 
seq. prog. 
no gran.ctl 
gran.ctl 
gc.stopping 
gc.argsize 

fib(19) 
1.839 
0.970 
1.389 
1.009 
0.639 



1 
43% 
4.0% 
+34% 

[named]imdealight
hanoi(13) 
6.309 
2.690 
2.839 
2.419 
2.419 

[named]imdealight 

1 
5.5% 
+10.1% 
+10.1% 

unbmatrix 
2.099 
1.039 
1.349 
0.870 
0.870 



1 
29.84% 
+16.27% 
+16.27% 

[named]imdealight
qsort(1000) 
3.670 
1.819 
2.009 
1.649 
1.429 

[named]imdealight 

1 
11% 
+9.3% 
+21% 

 With classic annotators (MEL, UDG, CDG, ...) we applied
granularity control after parallelization:
 Developed new annotation algorithm that takes task granularity
into account:
 Annotation is a heuristic process (several alternatives possible).
 Taking task granularity into account during annotation can
help make better choices and speed up annotation process.
 Tasks with larger cost bounds given priority, small ones not
parallelized.
 Use estimations/bounds on execution time for
controlling granularity (instead of steps/reductions).
 Execution time generally dependent on platform characteristics
( constants) and input data sizes (unknowns).
 Platformdependent, onetime calibration using fixed set of programs:
 Obtains value of the platformdependent constants (costs of
basic operations).
 Platformindependent, compiletime analysis:
 Infers cost functions (using modification of previous method),
which return count of basic operations given input data sizes.
 Incorporate the constants from the calibration.
we obtain functions yielding execution times
depending on size of input.
 Predicts execution times with reasonable accuracy (challenging!).
 Improving by
taking into account lower level factors (current work).
 Consider nrev with mode:
: pred nrev/2 : list(int) * var.
 Estimation of execution time for a concrete input consider:
A = [1,2,3,4,5], = length(A) = 5
[named]imdeasofter 
Once 
Static Analysis 
[named]imdeasofterApplication 
[named]imdeasofter
component 




step 
21.27 

21 
446.7 
[named]imdealight
nargs 
9.96 

57 
567.7 
giunif 
10.30 

31 
319.3 
[named]imdealight
gounif 
8.23 

16 
131.7 
viunif 
6.46 

45 
290.7 
[named]imdealight
vounif 
5.69 

30 
170.7 
[named]imdeasofter
[named]imdeasofterExecution time
: 
1926.8 
figure=Figs/fib15.1.c.ps,height=0.92,width=0.92
(VisAndOr
output.)
figure=Figs/fib15.8.c.ps,height=0.92,width=0.92
(VisAndOr
output.)
figure=Figs/fib15.8.c.fullscale.ps,height=0.92,width=0.92
(VisAndOr
output.)
figure=Figs/fib15.8.c.gran8.ps,height=0.92,width=0.92
(VisAndOr
output.)
 Performance:
 IAP speedups + new dependentand speedups
 IAP programs with one agent run at about 50% speed
w.r.t. sequential execution (due to locking and other overheads).
 DAP programs run at 30%40% lower speed.
 Basic Andorra model [D.H.D.Warren]: goals for which at most one
clause matches should be executed first (inspired by Naish's
PNUProlog).
 If a solution exists, computation rule is complete and correct
for pure programs (switching lemma). (But otherwise finite failures
can become infinite failures.)
 Determinate reductions can proceed in parallel without the need
of choice points
no dependent backtracking needed.
 An implementation: AndorraI [D.H.D. Warren, V.S. Costa, R.
Yang, I. Dutra...]
 Prolog support: preprocessor + engine (interpreter).
 Exploits both and and orparallelism. (Good speedups in practice)
 Problem: no nondeterministic steps can proceed in parallel.
 ``Extended'' Andorra Model [Warren]  add independent andparallelism.
 With implicit control (unspecified) [Warren, Gupta]
 With explicit/implicit control: AKL [Janson, Haridi ILPS91]
(implicit rule  ``stability'': nondeterministic steps can proceed
if ``they cannot affected'' by other steps)
 More parallelism can be exploited with these primitives.
 Take the sequential code below (dep. graph to the right) and
three possible parallelizations:

p(X,Y,Z) : 
p(X,Y,Z) : 
p(X,Y,Z) : 
a(X,Z), 
a(X,Z) & c(Y), 
c(Y) SPMamp;> Hc, 
b(X), 
b(X) & d(Y,Z). 
a(X,Z), 
c(Y), 

b(X) SPMamp;> Hb, 
d(Y,Z). 
p(X,Y,Z) : 
Hc <&, 

c(Y) & (a(X,Z),b(X)), 
d(Y,Z), 

d(Y,Z). 
Hb <&. 



Sequential 
Restricted IAP 
red Unrestricted IAP 
 In this case: unrestricted parallelization at least as good
(timewise) as any restricted one, assuming no overhead.
 Main idea:
 Publish goals (e.g., G SPMamp;> H) as soon as
possible.
 Wait for results (e.g., H <&) as late as
possible.
 One clause at a time.
 Limits to how soon a goal is published how late results
are gathered are given by the dependencies with the rest of the
goals in the clause.
 As with &/2, annotation may respect or not relative
order of goals in clause body.
 Order determined by SPMamp;>/2.
 Order not respected more flexibility in annotation.
0.85pt 

Number of processors 


1 
2 
3 
4 
5 
6 
0.85pt 
UMEL 
0.97 
0.97 
0.98 
0.98 
0.98 
0.98 
0.98 
0.98 
0.25pt 
UOUDG 
0.97 
1.55 
1.48 
1.49 
1.49 
1.49 
1.49 
0.65pt 
UDG 
0.97 
1.77 
1.66 
1.67 
1.67 
1.67 
1.67 
0.25pt 
UUDG 
0.97 
1.77 
1.66 
1.67 
1.67 
1.67 
1.67 
0.85pt

UMEL 
0.89 
0.98 
0.98 
0.97 
0.97 
0.98 
0.98 
0.99 
0.25pt 
UOUDG 
0.89 
1.70 
2.39 
2.81 
3.20 
3.69 
4.00 
0.65pt 
UDG 
0.89 
1.72 
2.43 
3.32 
3.77 
4.17 
4.41 
0.25pt 
UUDG 
0.89 
1.72 
2.43 
3.32 
3.77 
4.17 
4.41 
0.85pt

UMEL 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
0.25pt 
UOUDG 
0.99 
1.95 
2.89 
3.84 
4.78 
5.71 
6.63 
0.65pt 
UDG 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
1.00 
0.25pt 
UUDG 
0.99 
1.95 
2.89 
3.84 
4.78 
5.71 
6.63 
0.85pt

UMEL 
0.88 
1.61 
2.16 
2.62 
2.63 
2.63 
2.63 
2.63 
0.25pt 
UOUDG 
0.88 
1.62 
2.17 
2.64 
2.67 
2.67 
2.67 
0.65pt 
UDG 
0.88 
1.61 
2.16 
2.62 
2.63 
2.63 
2.63 
0.25pt 
UUDG 
0.88 
1.62 
2.39 
3.33 
4.04 
4.47 
5.19 
0.85pt










[width=0.48,height=0.27]
Figs/speedupsaiakl
[width=0.48,height=0.27]
Figs/speedupshanoi
AIAKL Hanoi
[width=0.48,height=0.27]
Figs/speedupsfibfun
[width=0.48,height=0.27]
Figs/speedupstak
FibFun Sun Fire T2000 
8 cores
Takeuchi
 Versions of andparallelism previously implemented:
&Prolog, &ACE, AKL, AndorraI,...
rely on complex lowlevel machinery. Each agent:
 Our objective: alternative, easier to maintain implementation
approach.
 Fundamental idea: raise noncritical components to the source language
level:
 Prologlevel: goal publishing, goal searching, goal
scheduling, ``marker'' creation (through choicepoints),...
 Clevel: lowlevel threading, locking, untrailing,...
Simpler machinery and more flexibility.
Easily exploits unrestricted IAP.
 Current implementation (for sharedmemory multiprocessors):
 Each agent: sequential Prolog machine + goal list +
(mostly) Prolog code.
 Recently added full parallel backtracking!
[width=0.47,height=0.25]
Figs/speedupboyer
[width=0.47,height=0.25]
Figs/speedupfib
BoyerMoore
Fibonacci
[width=0.47,height=0.25]
Figs/speedupqsort
[width=0.47,height=0.25]
Figs/speeduptak2
Quicksort
Takeuchi
 Different types of parallelism, with different costs associated:
 Complexity considerations (search space, speculation).
 Coordination cost for agreeing on unifiable bindings.
 Overheads / granularity control.
 Approaches:
 IAP: goals do not restrict each other's search space.
 Ensures no slowdown w.r.t. sequential execution.
 Retains as much as possible WAM optimizations.
 Some parallelism lost.
 NSIAP: IAP +...
 At most one goal can bind to nonvariable a shared
variable (or they make compatible bindings) and no
goal aliases shared variables.
 Generalization: search space preservation.
 Reduced to IAP via program analysis and transformation.
 DDAS: goals communicate bindings.
 Incorporate a suspension mechanism to ensure no more
work than in a sequential system  ``fine grained independence''.
 Handle dependent backtracking.
 Some locking and variablemanagement overhead.
 Andorra I: determinate depend. and + orparallelism
 Dependent determinate goals run in parallel.
 Allows incorporating also orparallelism easily.
 Some locking and goalmanagement overhead.
 Extended Andorra Model  adding independent and
parallelism to AndorraI.
 With implicit control.
 With explicit control: AKL.
 Much progress (e.g., in FORTRAN) for regular computations. But
comparatively less on:
 parallelization across procedure calls,
 irregular computations,
 complex data structures / pointers,
 speculation, etc.
 Several generations of parallelizing compilers for LP
and CLP [85...]:
 Good compilation speed, proved correct and efficient.
 Speedups over stateoftheart sequential systems on applications.
 Good demonstrators of abstract interpretation as dataflow
analysis technique.
 Now including granularity control.
Improved on hand parallelizations on several large applications.
 Areas of particularly good progress:
 Concepts of independence (pointers, search/speculation, constraints...).
 Interprocedural analysis (dynamic data, recursion,
pointers/aliasing, etc.).
 Parallelization algorithms for conditional dependency graphs.
 Dealing with irregularity:
 efficient task representation and fast dynamic scheduling,
 static inference of task cost functions  granularity control.
 Mixed static/dynamic parallelization techniques.
 Weaker areas / shortcomings:
 In general, weak in detecting independence in structure traversals
based on integer arithmetic (modeled as recursions over
recursive data structures to fit parallelizer).
 Weaker partitioning / placement for regular computations and
static data structures.
 Little work on mutating data structures (e.g., single assignment
transformations).
 The objective is to perform all these tasks well also!
 Opportunities for synergy.
 A final plug for constraint programming:
 Merges elegantly the symbolic and the numerical worlds.
 We believe many of the features of CLP will make it slowly into
mainstream languages (e.g., ILOG, ALMA, and other recent proposals).
 Some examples so far:
 Stealingbased scheduling strategies and microthreading.
 Cactuslike stack memory management techniques.
 Abstract interpretationbased static dependency analysis.
 Sharing (aliasing) analyses, Shape analyses, ...
 Parallelization (``annotation'') algorithms.
 Cost analysisbased granularity control.
 Logic variablebased synchronization.
 Determinacybased parallelization.
 ...
 Parallelism not yet exploited on an everyday basis (real system,
real applications).
 Some challenges:
 Scalability of techniques (from analysis to scheduling).
 Maintainability of the systems: simplification?
 Move as much as possible to source level?
(And explore this same route with many other things e.g., tabling)
 Better automatic parallelization:
 Better granularity control (e.g., timebased).
 Better granularityaware annotators.
 Full scalability of analysis (modular analysis, etc.).
 Automate program transformations (e.g., loop unrollings).
 Supporting multiple types of parallism easily is still a challenge.
 A really elegant (and implementable) concurrent language which
includes nondeterminism.
 Combination w/lowlevel optimization and other features
(r.g., or// YapTab).
 AK90

K. A. M. Ali and R. Karlsson.
Full Prolog and Scheduling Orparallelism in Muse.
International Journal of Parallel Programming, 19(6):445475,
1990.
 AMSS94

T. Armstrong, K. Marriott, P. Schachte, and H. Søndergaard.
Boolean functions for dependency analysis: Algebraic properties and
efficient representation.
In SpringerVerlag, editor, Static Analysis Symposium, SAS'94,
number 864 in LNCS, pages 266280, Namur, Belgium, September 1994.
 BB93

Jonas Barklund and Johan Bevemyr.
Executing bounded quantifications on shared memory multiprocessors.
In Jaan Penjam, editor, Proc. Intl. Conf. on Programming
Language Implementation and Logic Programming 1993, LNCS 714, pages
302317, Berlin, 1993. SpringerVerlag.
 BCC^{+}06

F. Bueno, D. Cabeza, M. Carro, M. Hermenegildo, P. LópezGarcía, and
G. Puebla (Eds.).
The Ciao System. Ref. Manual (v1.13).
Technical report, C. S. School (UPM), 2006.
Available at http://www.ciaohome.org.
 BCHP96

F. Bueno, D. Cabeza, M. Hermenegildo, and G. Puebla.
Global Analysis of Standard Prolog Programs.
In European Symposium on Programming, number 1058 in LNCS,
pages 108124, Sweden, April 1996. SpringerVerlag.
 BCMH94

C. Braem, B. Le Charlier, S. Modart, and P. Van Hentenryck.
Cardinality analysis of prolog.
In Proc. International Symposium on Logic Programming, pages
457471, Ithaca, NY, November 1994. MIT Press.
 BdlBH94a

F. Bueno, M. García de la Banda, and M. Hermenegildo.
A Comparative Study of Methods for Automatic Compiletime
Parallelization of Logic Programs.
In First International Symposium on Parallel Symbolic
Computation, PASCO'94, pages 6373. World Scientific Publishing Company,
September 1994.
 BdlBH94b

F. Bueno, M. García de la Banda, and M. Hermenegildo.
Effectiveness of Global Analysis in Strict
IndependenceBased Automatic Program Parallelization.
In International Symposium on Logic Programming, pages
320336. MIT Press, November 1994.
 BdlBH99

F. Bueno, M. García de la Banda, and M. Hermenegildo.
Effectiveness of Abstract Interpretation in Automatic
Parallelization: A Case Study in Logic Programming.
ACM Transactions on Programming Languages and Systems,
21(2):189238, March 1999.
 BHMR94

F. Bueno, M. Hermenegildo, U. Montanari, and F. Rossi.
From Eventual to Atomic and Locally Atomic CC Programs:
A Concurrent Semantics.
In Fourth International Conference on Algebraic and Logic
Programming, number 850 in LNCS, pages 114132. SpringerVerlag, September
1994.
 BHMR98

F. Bueno, M. Hermenegildo, U. Montanari, and F. Rossi.
Partial Order and Contextual Net Semantics for Atomic and
Locally Atomic CC Programs.
Science of Computer Programming, 30:5182, January 1998.
Special CCP95 Workshop issue.
 Bru91

M. Bruynooghe.
A Practical Framework for the Abstract Interpretation of
Logic Programs.
Journal of Logic Programming, 10:91124, 1991.
 BW93

T. Beaumont and D.H.D. Warren.
Scheduling Speculative Work in OrParallel Prolog
Systems.
In Proceedings of the 10th International Conference on Logic
Programming, pages 135149. MIT Press, June 1993.
 Cab04

D. Cabeza.
An Extensible, Global Analysis Friendly Logic
Programming System.
PhD thesis, Universidad Politécnica de Madrid (UPM), Facultad
Informatica UPM, 28660Boadilla del Monte, MadridSpain, August 2004.
 Cas08

A. Casas.
Automatic Unrestricted Independent AndParallelism in
Declarative Multiparadigm Languages.
PhD thesis, University of New Mexico (UNM), Electrical and Computer
Engineering Department, University of New Mexico, Albuquerque, NM 871310001
(USA), September 2008.
 CCH07

A. Casas, M. Carro, and M. Hermenegildo.
Annotation Algorithms for Unrestricted Independent
AndParallelism in Logic Programs.
In 17th International Symposium on Logicbased Program Synthesis
and Transformation (LOPSTR'07), number 4915 in LNCS, pages 138153, The
Technical University of Denmark, August 2007. SpringerVerlag.
 CCH08a

A. Casas, M. Carro, and M. Hermenegildo.
A HighLevel Implementation of NonDeterministic,
Unrestricted, Independent AndParallelism.
In M. García de la Banda and E. Pontelli, editors, 24th
International Conference on Logic Programming (ICLP'08), LNCS.
SpringerVerlag, December 2008.
 CCH08b

A. Casas, M. Carro, and M. Hermenegildo.
Towards a HighLevel Implementation of Execution
Primitives for Nonrestricted, Independent Andparallelism.
In D.S. Warren and P. Hudak, editors, 10th International
Symposium on Practical Aspects of Declarative Languages (PADL'08), volume
4902 of LNCS, pages 230247. SpringerVerlag, January 2008.
 CDD85

J.H. Chang, A. M. Despain, and D. Degroot.
AndParallelism of Logic Programs Based on Static Data
Dependency Analysis.
In Compcon Spring '85, pages 218225. IEEE Computer Society,
February 1985.
 CDO88

M. Carlsson, K. Danhof, and R. Overbeek.
A Simplified Approach to the Implementation of
AndParallelism in an OrParallel Environment.
In Fifth International Conference and Symposium on Logic
Programming, pages 15651577. MIT Press, August 1988.
 CGH93

M. Carro, L. Gómez, and M. Hermenegildo.
Some Paradigms for Visualizing Parallel Execution of
Logic Programs.
In 1993 International Conference on Logic Programming, pages
184201. MIT Press, June 1993.
 CH94

D. Cabeza and M. Hermenegildo.
Extracting Nonstrict Independent Andparallelism Using
Sharing and Freeness Information.
In 1994 International Static Analysis Symposium, number 864 in
LNCS, pages 297313, Namur, Belgium, September 1994. SpringerVerlag.
 CH96

D. Cabeza and M. Hermenegildo.
Implementing Distributed Concurrent Constraint Execution in
the CIAO System.
In Proc. of the AGP'96 Joint conference on Declarative
Programming, pages 6778, San Sebastian, Spain, July 1996. U. of the
Basque Country.
Available from http://www.cliplab.org/
http://www.cliplab.org/.
 Cie92

A. Ciepielewski.
Scheduling in orparallel prolog systems: Survey and open problems.
International Journal of Parallel Programming, 20(6):421451,
1992.
 Clo87

William Clocksin.
Principles of the delphi parallel inference machine.
Computer Journal, 30(5), 1987.
 CMB^{+}95

M. Codish, A. Mulkers, M. Bruynooghe, M. García de la Banda, and
M. Hermenegildo.
Improving Abstract Interpretations by Combining Domains.
ACM Transactions on Programming Languages and Systems,
17(1):2844, January 1995.
 Con83

J. S. Conery.
The And/Or Process Model for Parallel Interpretation of Logic
Programs.
PhD thesis, The University of California At Irvine, 1983.
Technical Report 204.
 CSW88

J. Chassin, J. Syre, and H. Westphal.
Implementation of a Parallel Prolog System on a Commercial
Multiprocessor.
In Proceedings of Ecai, pages 278283, August 1988.
 DeG84

D. DeGroot.
Restricted ANDParallelism.
In International Conference on Fifth Generation Computer
Systems, pages 471478. Tokyo, November 1984.
 DeG87

D. DeGroot.
A Technique for Compiling Execution Graph Expressions for
Restricted ANDparallelism in Logic Programs.
In Int'l Supercomputing Conference, pages 8089, Athens, 1987.
Springer Verlag.
 DJ94

S. Debray and M. Jain.
A Simple Program Transformation for Parallelism.
In 1994 International Symposium on Logic Programming, pages
305319. MIT Press, November 1994.
 DL91

S. K. Debray and N.W. Lin.
Automatic complexity analysis for logic programs.
In Eighth International Conference on Logic Programming, pages
599613, Paris, France, June (1991). MIT Press.
 DL93

S. K. Debray and N. W. Lin.
Cost Analysis of Logic Programs.
ACM Transactions on Programming Languages and Systems,
15(5):826875, November 1993.
 dlB94

M. García de la Banda.
Independence, Global Analysis, and Parallelism in
Dynamically Scheduled Constraint Logic Programming.
PhD thesis, Universidad Politécnica de Madrid (UPM), Facultad
Informatica UPM, 28660Boadilla del Monte, MadridSpain, September 1994.
 dlBBH96

M. García de la Banda, F. Bueno, and M. Hermenegildo.
Towards Independent AndParallelism in CLP.
In Programming Languages: Implementation, Logics, and Programs,
number 1140 in LNCS, pages 7791, Aachen, Germany, September 1996.
SpringerVerlag.
 dlBH93

M. García de la Banda and M. Hermenegildo.
A Practical Approach to the Global Analysis of Constraint
Logic Programs.
In 1993 International Logic Programming Symposium, pages
437455. MIT Press, October 1993.
 dlBHB^{+}96

M. García de la Banda, M. Hermenegildo, M. Bruynooghe, V. Dumortier,
G. Janssens, and W. Simoens.
Global Analysis of Constraint Logic Programs.
ACM Transactions on Programming Languages and Systems,
18(5):564615, September 1996.
 dlBHM93

M. García de la Banda, M. Hermenegildo, and K. Marriott.
Independence in Constraint Logic Programs.
In 1993 International Logic Programming Symposium, pages
130146. MIT Press, Cambridge, MA, October 1993.
 dlBHM96

M. García de la Banda, M. Hermenegildo, and K. Marriott.
Independence in dynamically scheduled logic languages.
In 1996 International Conference on Algebraic and Logic
Programming, number 1139 in LNCS, pages 4761. SpringerVerlag, September
1996.
 dlBHM00

M. García de la Banda, M. Hermenegildo, and K. Marriott.
Independence in CLP Languages.
ACM Transactions on Programming Languages and Systems,
22(2):269339, March 2000.
 DLGH97

S.K. Debray, P. LópezGarcía, and M. Hermenegildo.
NonFailure Analysis for Logic Programs.
In 1997 International Conference on Logic Programming, pages
4862, Cambridge, MA, June 1997. MIT Press, Cambridge, MA.
 DLGHL94

S.K. Debray, P. LópezGarcía, M. Hermenegildo, and N.W. Lin.
Estimating the Computational Cost of Logic Programs.
In Static Analysis Symposium, SAS'94, number 864 in LNCS, pages
255265, Namur, Belgium, September 1994. SpringerVerlag.
 DLGHL97

S. K. Debray, P. LópezGarcía, M. Hermenegildo, and N.W. Lin.
Lower Bound Cost Estimation for Logic Programs.
In 1997 International Logic Programming Symposium, pages
291305. MIT Press, Cambridge, MA, October 1997.
 DLH90

S. K. Debray, N.W. Lin, and M. Hermenegildo.
Task Granularity Analysis in Logic Programs.
In Proc. of the 1990 ACM Conf. on Programming Language Design
and Implementation, pages 174188. ACM Press, June 1990.
 ECR93

ECRC.
Eclipse User's Guide.
European Computer Research Center, 1993.
 FCH96

M. Fernández, M. Carro, and M. Hermenegildo.
IDRA (IDeal Resource Allocation): Computing Ideal
Speedups in Parallel Logic Programming.
In Proceedings of EuroPar'96, number 1124 in LNCS, pages
724734. SpringerVerlag, August 1996.
 FIVC98

N. Fonseca, I.C.Dutra, and V.Santos Costa.
VisAll: A Universal Tool to Visualise Parallel
Execution of Logic Programs.
In J. Jaffar, editor, Joint International Conference and
Symposium on Logic Programming, pages 100114. MIT Press, 1998.
 GH91

F. Giannotti and M. Hermenegildo.
A Technique for Recursive Invariance Detection and
Selective Program Specialization.
In Proc. 3rd. Int'l Symposium on Programming Language
Implementation and Logic Programming, number 528 in LNCS, pages 323335.
SpringerVerlag, August 1991.
 GHPSC94

G. Gupta, M. Hermenegildo, E. Pontelli, and V. SantosCosta.
ACE: And/Orparallel Copyingbased Execution of Logic
Programs.
In International Conference on Logic Programming, pages
93110. MIT Press, June 1994.
 GJ93

G. Gupta and B. Jayaraman.
Analysis of orparallel execution models.
ACM Transactions on Programming Languages and Systems,
15(4):659680, 1993.
 GPA^{+}01

G. Gupta, E. Pontelli, K. Ali, M. Carlsson, and M. Hermenegildo.
Parallel Execution of Prolog Programs: a Survey.
ACM Transactions on Programming Languages and Systems,
23(4):472602, July 2001.
 HBC^{+}99

M. Hermenegildo, F. Bueno, D. Cabeza, M. Carro, M. García de la Banda,
P. LópezGarcía, and G. Puebla.
The CIAO MultiDialect Compiler and System: An
Experimentation Workbench for Future (C)LP Systems.
In Parallelism and Implementation of Logic and
Constraint Logic Programming, pages 6585. Nova Science, Commack, NY,
USA, April 1999.
 HBC^{+}08

M. V. Hermenegildo, F. Bueno, M. Carro, P. López, J.F. Morales, and
G. Puebla.
An Overview of The Ciao Multiparadigm Language and
Program Development Environment and its Design Philosophy.
In Jose Meseguer Pierpaolo Degano, Rocco De Nicola, editor, Festschrift for Ugo Montanari, number 5065 in LNCS, pages 209237.
SpringerVerlag, June 2008.
 HBPLG99

M. Hermenegildo, F. Bueno, G. Puebla, and P. LópezGarcía.
Program Analysis, Debugging and Optimization Using the
Ciao System Preprocessor.
In 1999 Int'l. Conference on Logic Programming, pages 5266,
Cambridge, MA, November 1999. MIT Press.
 HC94

M. Hermenegildo and The CLIP Group.
Some Methodological Issues in the Design of CIAO  A
Generic, Parallel, Concurrent Constraint System.
In Principles and Practice of Constraint Programming, number
874 in LNCS, pages 123133. SpringerVerlag, May 1994.
 HC95

M. Hermenegildo and M. Carro.
Relating DataParallelism and AndParallelism in Logic
Programs.
In Proceedings of EUROPAR'95, number 966 in LNCS, pages
2742. SpringerVerlag, August 1995.
 HC96

M. Hermenegildo and M. Carro.
Relating DataParallelism and (And) Parallelism in
Logic Programs.
The Computer Languages Journal, 22(2/3):143163, July 1996.
 Her86a

M. Hermenegildo.
An Abstract Machine Based Execution Model for Computer
Architecture Design and Efficient Implementation of Logic Programs in
Parallel.
PhD thesis, Dept. of Electrical and Computer Engineering (Dept. of
Computer Science TR8620), University of Texas at Austin, Austin, Texas
78712, August 1986.
 Her86b

M. Hermenegildo.
An Abstract Machine for Restricted ANDparallel Execution
of Logic Programs.
In Third International Conference on Logic Programming, number
225 in Lecture Notes in Computer Science, pages 2540. Imperial College,
SpringerVerlag, July 1986.
 Her87

M. Hermenegildo.
Relating Goal Scheduling, Precedence, and Memory
Management in ANDParallel Execution of Logic Programs.
In Fourth International Conference on Logic Programming, pages
556575. University of Melbourne, MIT Press, May 1987.
 Her97

M. Hermenegildo.
Automatic Parallelization of Irregular and PointerBased
Computations: Perspectives from Logic and Constraint Programming.
In Proceedings of EUROPAR'97, volume 1300 of LNCS,
pages 3146. SpringerVerlag, August 1997.
 Her00

M. Hermenegildo.
Parallelizing Irregular and PointerBased Computations
Automatically: Perspectives from Logic and Constraint Programming.
Parallel Computing, 26(1314):16851708, December 2000.
 HPBLG03

M. Hermenegildo, G. Puebla, F. Bueno, and P. LópezGarcía.
Program Development Using Abstract Interpretation (and
The Ciao System Preprocessor).
In 10th International Static Analysis Symposium
(SAS'03), number 2694 in LNCS, pages 127152. SpringerVerlag, June 2003.
 HPMS00

M. Hermenegildo, G. Puebla, K. Marriott, and P. Stuckey.
Incremental Analysis of Constraint Logic Programs.
ACM Transactions on Programming Languages and Systems,
22(2):187223, March 2000.
 HR89

M. Hermenegildo and F. Rossi.
On the Correctness and Efficiency of Independent
AndParallelism in Logic Programs.
In 1989 North American Conference on Logic Programming, pages
369390. MIT Press, October 1989.
 HR90

M. Hermenegildo and F. Rossi.
NonStrict Independent AndParallelism.
In 1990 International Conference on Logic Programming, pages
237252. MIT Press, June 1990.
 HR95

M. Hermenegildo and F. Rossi.
Strict and NonStrict Independent AndParallelism in
Logic Programs: Correctness, Efficiency, and CompileTime
Conditions.
Journal of Logic Programming, 22(1):145, 1995.
 HW87

M. Hermenegildo and R. Warren.
Designing a HighPerformance Parallel Logic Programming
System.
Computer Architecture News, Special Issue on Parallel Symbolic
Programming, 15(1):4353, March 1987.
 JH90

S. Janson and S. Haridi.
Programming Paradigms of the Andorra Kernel Language.
Technical Report PEPMA Project, SICS, Box 1263, S164 28 KISTA,
Sweden, November 1990.
Forthcoming.
 JH91

S. Janson and S. Haridi.
Programming Paradigms of the Andorra Kernel Language.
In 1991 International Logic Programming Symposium, pages
167183. MIT Press, 1991.
 JL89

D. Jacobs and A. Langen.
Accurate and Efficient Approximation of Variable Aliasing
in Logic Programs.
In 1989 North American Conference on Logic Programming. MIT
Press, October 1989.
 JL92

D. Jacobs and A. Langen.
Static Analysis of Logic Programs for Independent
AndParallelism.
Journal of Logic Programming, 13(2 and 3):291314, July 1992.
 KMM^{+}96

A. Kelly, A. Macdonald, K. Marriott, P.J. Stuckey, and R.H.C. Yap.
Effectiveness of optimizing compilation of CLP().
In M.J. Maher, editor, Logic Programming: Proceedings of the
1992 Joint International Conference and Symposium, pages 3751, Bonn,
Germany, September 1996. MIT Press.
 LBD^{+}88

E. Lusk, R. Butler, T. Disz, R. Olson, R. Stevens, D. H. D. Warren,
A. Calderwood, P. Szeredi, P. Brand, M. Carlsson, A. Ciepielewski,
B. Hausman, and S. Haridi.
The Aurora Orparallel Prolog System.
New Generation Computing, 7(2/3):243271, 1988.
 LGHD94

P. LópezGarcía, M. Hermenegildo, and S.K. Debray.
Towards Granularity Based Control of Parallelism in Logic
Programs.
In Hoon Hong, editor, Proc. of First International Symposium on
Parallel Symbolic Computation, PASCO'94, pages 133144. World Scientific,
September 1994.
 LGHD96

P. LópezGarcía, M. Hermenegildo, and S. K. Debray.
A Methodology for Granularity Based Control of
Parallelism in Logic Programs.
Journal of Symbolic Computation, Special Issue on Parallel
Symbolic Computation, 21(46):715734, 1996.
 LK88

Y. J. Lin and V. Kumar.
ANDParallel Execution of Logic Programs on a Shared
Memory Multiprocessor: A Summary of Results.
In Fifth International Conference and Symposium on Logic
Programming, pages 11231141. MIT Press, August 1988.
 MBdlBH99

K. Muthukumar, F. Bueno, M. García de la Banda, and M. Hermenegildo.
Automatic Compiletime Parallelization of Logic Programs
for Restricted, Goallevel, Independent Andparallelism.
Journal of Logic Programming, 38(2):165218, February 1999.
 MH89

K. Muthukumar and M. Hermenegildo.
Determination of Variable Dependence Information at
CompileTime Through Abstract Interpretation.
In 1989 North American Conference on Logic Programming, pages
166189. MIT Press, October 1989.
 MH90

K. Muthukumar and M. Hermenegildo.
The CDG, UDG, and MEL Methods for Automatic
Compiletime Parallelization of Logic Programs for Independent
Andparallelism.
In Int'l. Conference on Logic Programming, pages 221237.
MIT Press, June 1990.
 MH91

K. Muthukumar and M. Hermenegildo.
Combined Determination of Sharing and Freeness of Program
Variables Through Abstract Interpretation.
In 1991 International Conference on Logic Programming, pages
4963. MIT Press, June 1991.
 MH92

K. Muthukumar and M. Hermenegildo.
Compiletime Derivation of Variable Dependency Using
Abstract Interpretation.
Journal of Logic Programming, 13(2/3):315347, July 1992.
 MLGCH08

E. Mera, P. LópezGarcía, M. Carro, and M. Hermenegildo.
Towards Execution Time Estimation in Abstract
MachineBased Languages.
In 10th Int'l. ACM SIGPLAN Symposium on Principles and Practice
of Declarative Programming (PPDP'08), pages 174184. ACM Press, July 2008.
 MS93

K. Marriott and H. Søndergaard.
Precise and efficient groundness analysis for logic programs.
Technical report 93/7, Univ. of Melbourne, 1993.
 PG95a

E. Pontelli and G. Gupta.
An Overview of the ACE Project.
In Proc. of Compulog ParImp Workshop, 1995.
 PG95b

E. Pontelli and G. Gupta.
Data AndParallel Execution of Prolog Programs in ACE.
In IEEE Symposium on Parallel and Distributed Processing, pages
424431. IEEE Computer Society, 1995.
 PG98

E. Pontelli and G. Gupta.
Efficient Backtracking in AndParallel Implementations of
NonDeterministic Languages.
In T. Lai, editor, Proc. of the International Conference on
Parallel Processing, pages 338345. IEEE Computer Society, Los Alamitos,
CA, 1998.
 PGPF97

E. Pontelli, G. Gupta, F. Pulvirenti, and A. Ferro.
Automatic Compiletime Parallelization of Prolog Programs
for Dependent AndParallelism.
In L. Naish, editor, Proc. of the Fourteenth International
Conference on Logic Programming, pages 108122. MIT Press, July 1997.
 PH96

G. Puebla and M. Hermenegildo.
Optimized Algorithms for the Incremental Analysis of Logic
Programs.
In International Static Analysis Symposium, number 1145 in
LNCS, pages 270284. SpringerVerlag, September 1996.
 PH99

G. Puebla and M. Hermenegildo.
Abstract Multiple Specialization and its Application to
Program Parallelization.
J. of Logic Programming. Special Issue on Synthesis,
Transformation and Analysis of Logic Programs, 41(2&3):279316, November
1999.
 PK96

S. Prestwich and A. Kusalik.
ProgrammerOriented Parallel Performance Visualizatoin.
Technical Report TR9601, CS Dept., University of
Saskatchewan, 1996.
 Pre93

Steven Prestwich.
Improving granularity by program transformation.
ParForce esprit project report d.wp.1.4.1.m1.2, CEC, July 1993.
 RSC99

Silva F. Rocha, R. and V. Santos Costa.
Yapor: an orparallel prolog system based on environment copying.
In Proceedings of EPPIA'99: The 9th Portuguese Conference on
Artificial Intelligence, 1999.
 SCK98

K. Shen, V. S. Costa, and A. King.
Distance: a New Metric for Controlling Granularity for
Parallel Execution.
In Joxan Jaffar, editor, Joint International Conference and
Symposium on Logic Programming, pages 8599, Cambridge, MA, June 1998. MIT
Press, Cambridge, MA.
 SH96

K. Shen and M. Hermenegildo.
Flexible Scheduling for NonDeterministic, Andparallel
Execution of Logic Programs.
In Proceedings of EuroPar'96, number 1124 in LNCS, pages
635640. SpringerVerlag, August 1996.
 She92

K. Shen.
Exploiting Dependent AndParallelism in Prolog: The
Dynamic, Dependent AndParallel Scheme.
In Proc. Joint Int'l. Conf. and Symp. on Logic Prog., pages
717731. MIT Press, 1992.
 She96

K. Shen.
Overview of DASWAM: Exploitation of Dependent
Andparallelism.
Journal of Logic Programming, 29(13):245293, November 1996.
 Søn86

H. Søndergaard.
An application of abstract interpretation of logic programs: occur
check reduction.
In European Symposium on Programming, LNCS 123, pages 327338.
SpringerVerlag, 1986.
 Tic92

Evan Tick.
Visualizing Parallel Logic Programming with VISTA.
In International Conference on Fifth Generation Computer
Systems, pages 934942. Tokio, ICOT, June 1992.
 Van89

P. Van Hentenryck.
Parallel Constraint Satisfaction in Logic Programming.
In G. Levi and M. Martelli, editors, Sixth International
Conference on Logic Programming, pages 165180, Lisbon, Portugal, June
1989. MIT Press.
 VPG97

R. Vaupel, E. Pontelli, and G. Gupta.
Visualization of And/OrParallel Execution of Logic
Programs.
In International Conference on Logic Programming, Logic
Programming, pages 271285. MIT Press, July 1997.
 War90

D.H.D. Warren.
The Extended Andorra Model with Implicit Control.
In Sverker Jansson, editor, Parallel Logic Programming
Workshop, Box 1263, S163 13 Spanga, SWEDEN, June 1990. SICS.
Last modification: Sun Oct 12 08:53:59 CEST 2008 <webmaster@clip.dia.fi.upm.es>