Insight problem solving - symmetry
Consider the following problem: I have two quart
beakers, beaker A filled with a pint of coffee and B
with a pint of milk. I take a cup, say a measuring
glass, and: (i) fill it up with coffee from A and pour
it into B, mix it thoroughly, and: (ii) return a cup
filled with the mixture to A and then mix it up. Now
both beakers have a pint of liquid in them. Is the
concentration of milk in A the same as the concentration
of coffee in B? See my book "Problem Solving" for
explanations.
Integration of closed contours using log-polar
representation
When the visual system solves problems in early areas
of the visual cortex it does it using log-polar
representation of the retina. This representation
simplifies a problem of integrating closed contours.
When the closed contour is around the center of the
retina, its representation in V1 is a path. It follows
the finding the shortest path in V1 is equivalent to
integrating a closed contour on the retina. The main
stages of our model are shown
here. See Hii and Pizlo (2023) for details.
Reconstruction of 3D shapes of natural objects
- psychophysics and model
Subjects reconstructed 3D shapes of three types of
stimuli adjusting an aspect ratio of the 3D shape.
Natural shapes, random symmetrical polyhedra and
polyhedra composed of rectangular boxes were used. See here
for several examples. Subjects' reconstruction of
natural shapes and of polyhedra composed of rectangular
boxes was veridical. See Beers and Pizlo (2024) for
details. The computational model performs the same 3D
reconstruction with both orthographic and perspective
images.
Figure-ground organization (FGO)
3D objects are symmetrical or nearly symmetrical, but a
configuration of unrelated object is not likely to be
symmetrical. It follows that establishing the smallest
number of 3D symmetry planes that can account for a 3D
scene solves the problem of identifying objects in the
scene. Specifically, each symmetry plane corresponds to
an object. See Michaux et al. (2016) for details and go
here
to see examples.
Recovering 3D shapes of real objects from real
images
A priori constraints are as important as the sensory
data in 3D vision. Michaux et al. (2017)combined 3D
symmetry of an object with a pair of real camera images.
3D recovery of shapes, sizes and positions was
absolutely perfect. When the object has 2 planes of
mirror symmetry the front and back of the objects is
recovered. Go here
to see a few examples.
Shape Perception
Jayadevan et al. (2018) tested human shape perception
with symmetrical and nearly symmetrical shapes. Viewing
was binocular or monocular. Our computational model
recovered these 3D shapes as well as the subjects did.
The model and the subjects adjusted three parameters
from the family of 3D affine transformations. The
recovery in the model corresdponded to the minimum of a
cost function. The same cost function was used for
monocular and binocular viewing of the model. This is
the first and the only computational model that explains
both viewing conditions. Monocular performance of our
subjects and of our model shows that any theory of 3D
shape perception that requires multiple views is
inadequate.
See the
animation illustrating the three parameter family.
Here is how the
psychophysical experiment looked. And on
this site you can see 3D shapes reconstructed by
our subjects and our model.
Contribution of stereoacuity to 3D shape
recovery
Stereoacuity refers to the binocular ability to judge
the depth order of features. Stereoacuity is a
hyperacuity which, in technical jargon, means subpixel
resolution. Stereoacuity has never been used in theories
of shape perception because it does not allow
reconstructing depth intervals. So, all previous
theories of binocular shape perception used ordinary
binocular disparity. It turns out that when stereoacuity
is combined with symmetry a priori constraint, the
recovery result is absolutely perfect. How this works is
described by Li et al. (2011) and illustrated in this animation.
Symmetry and skewed symmetry
Most natural objects are symmetrical: animals are
symmetrical because of the way they move, plants are
symmetrical because of the way they grow, and man-made
objects are symmetrical because of the functions they
serve. Once the utility and omnipresence of symmetry is
appreciated, one should expect symmetry to be used by
visual systems (both human and computer) as an important
a priori constraint (an assumption) designed to allow
them to produce accurate perceptual interpretations of
the 3D shapes of objects in their natural environment.
Using symmetry effectively for this purpose is
complicated by the fact that the 2D retinal image of a
symmetrical 3D object is always asymmetrical, but note
that the symmetry of the object is only distorted in its
2D image. It is not destroyed. We have been able to show
that the human visual system is able to detect the
distorted (skewed) symmetry inherent in a 2D retinal
image and then use this information to recover the shape
of the symmetrical 3D object. Several examples of 3D
recovery can be seen here.
Details are described in our 2014 book "Making a machine
that sees like us."
Note, however, that 3D symmetry is not sufficient for
reliable recovery: it turns out that any 2D retinal image
has 3D symmetrical interpretations (Sawada et al., 2011).
Here are example
1, and example
2. For 3D symmetry to be fully effective, additional
constraints, such as planarity, must be used as well - see
example 3.
Problem Solving - Traveling Salesman
Problem solving is one of the human beings fundamental
cognitive abilities. It is at least as important as the
other more commonly-studied, mental activities, namely,
perception, memory, decision making and learning. We
approach problem solving by adopting an
information-processing methodology and use it to study
computationally difficult (intractable) problems that
can be presented to the subject visually, for example,
the Traveling Salesman Problem. Human subjects produce
near-optimal solutions to such combinatorial
optimization problems in linear time. A hierarchical
(pyramid) algorithm is the only model that can emulate
human performance. It performs fine-to-coarse or
coarse-to-fine hierarchical clustering of states
(cities) and then produces a solution tour by using a
sequence of successive approximations in a
coarse-to-fiine direction. The model emulates
non-uniform distribution of receptors in the human
retina, as well as eye-movements that move the model's
attention. See a demo
that shows how the model solves 50-city TSP.
In 2013 (Pizlo and Stefanov) we modified the model so
that its working memory can store only a few pieces of
information at a time. This modification did not reduce
the quality or the speed of the solution. Four demos
illustrate how the model's visual representation
zooms-out and zooms-in during the process of analyzing
spatially global and spatially local features of the
problem.
demo 2
demo 3
demo 4
demo 5
Phi Phenomenon
In 1912, Max Wertheimer (1880-1943), the founder of the
Gestalt School of Psychology, published a monograph on
the perception of apparent motion that profoundly
influenced subsequent perceptual research and theory.
Wertheimer's contribution was inspired by his
serendipitous observation of what he called a "pure"
apparent movement. It was pure in the sense that the
motion was not associated with perceiving any object
changing its location in space. He called this pure
motion the "phi-phenomenon" to distinguish it from
"optimal" apparent movement (called "beta"). In the demo
you can see beta and "magniphi" which is our vivid
version of Wertheimer's phi. Our description of this phenomenon,
including history, is in Steinman et al. (2000).