The target image is not just a 2n x 2n grid, it's also a collection of 20 x 20, ..., 2n-1 x 2n-1 grids.
For free (painful and brainful, but cheap) with a wavelet transform of the target image (except for storage), you get (calculated in the intermediate steps) the transforms of its subsquares-- its quarters, sixteenths, sixty-fourths, etc. After e.g. four levels of transform and split, the wavelet coefficients for the top left 32x32 subsquare are can be dug out of the matrix like this:
There's a bunch of these, since
Both more free and less for the tiles, we have their wavelet
transforms at all of their available resolutions. For Haar wavelets,
the coefficients of the width/2 x height/2 lower-res version of the
image are simply the smoother fourth, the upper left quadrant. (For
Daub-4's, they may be close enough for government work matching
purposes, if not for actually reproducing the smaller image (funny
scaling); they're so expressive that the current run uses them.)
So for each tile there is be a database entry with its filename, original size, and largest 40 wavelet coefficients, index (in the form (x,y,color) (where color ranges through Y,I,Q), for easy translation to different tile sizes). For the target, there's a much larger file with the top40 wavelet transform coefficients of the entire image, of each of its quadrants, sixteenths, sixty-fourths, etc., down to some specified grid-width minimum like 16x16 (for a 512x512 image, that's about 1300 records).
Given this, any potential tile image can be matched against any subsquare of the target image by the same operation. It is never necessary to say "match only on the tile coefficients that survive at this scale" because the inappropriate ones will simply never match (the target subsquare's wavelet transformation was done at the correct resolution, so it has no coefficient to match the high-res tile coefficient). The magnification the final placement will require is simply recorded along with the score of the match.
Now that the image is covered with layers of best matches (currently one at each subgridsquare), the question is which should show, i.e. how to collage/montage the pyramid/iceberg/tree of images (thence "triage.c"). This could be static, some blending between the layers, or dynamic, a movie, maybe going through random combinations. One could just average them, which gives, well, notable results. One could try to de-emphasize the edges between tiles by weighting each layer by a sine^2 wave at the frequency that brings it to zero at the edges and then averaging.
Or, since the images aren't going to move any more, we could use cluster-weighted modeling: the program could send out a few gaussians to each subsquare to decide which image explains the data best in its region. From these numbers we get the alpha values for the images, and we get a smooth blend between regions of well-decipherable single images.