To enforce this separation, we deliberately chose two separate sets of words :
Nota : From the user interface, it's always possible to go back to a previous step (although not all steps can be such backward targets). See the 'Step' menu on the user interface.
The steps are detailed hereafter, together with information on how they work When relevant, hints are indicated on possible improvements, using italic font.
Here is a part of an example bitmap (fandango.bmp) right after loading :
The following picture represents the same music example as above, once the frames have been detected and logically removed :
The processing requires 3 phases :
Phase 1 is remarkably fast and reliable, and provides good input for subsequent line removal (phase 3).
Phase 2 is good provided that the skew is limited to a couple of degrees. Otherwise, the horizontal projections don't exhibit sharp peaks. We should investigate a different approach, based on horizontal LAGs, so that chunks of frame lines are correctly detected and then assembled even in case of significant skew. For the time being, the best workaround is to be careful when scanning the music sheet !
Phase 3 suffers from the limited scope of the test used on traversal objects : we don't look further than 2 pixels above or below the line. We should use the horizontal LAG approach to get a better idea of significant objects located above, below or across the line rather than line local anomalies : the weight of LAG sections would indicate real objects.
The process is run frame by frame. Typically the frames are separated vertically by the middle of the inter-frame space. A glyph is assigned to the frame that is closest to its mass center. Each frame is thus assigned a set of glyphs.
The result is not directly visible in the bitmap display. However, when you move the mouse over a glyph, its item ID, as well as its shape currently unknown, appears in the Picture panel. When the snapshot was taken, the mouse was over the flat sign at the upper left corner of the example : the ID is 9g4, where '9' stands for frame number 9 from the beginning of the page, 'g' for glyph, and '4' for the fourth glyph in the frame (1 is the brace, 2 the bar line, 3 the treble clef). This glyph had been flooded with pixel color 'd'.
Remarks : This phase may be the weakest of all. This is due to the fact that connectivity is highly dependent on bitmap quality : objects may get separated because only one pixel is missing, and on the contrary other objects may get incorrectly merged because they are too close to each other. The consequences are too important, although we try in later phases to remedy these incorrect connections (or non-connections). This is surely a design flaw : we should investigate an approach using object proximity but not base the whole subsequent processing on lack or presence of just a few pixels.
Here is the result of symbol evaluation, always on the same music example. We have recognized the brace, the treble clef, the dots near the bass clef, all the bar lines but one which is touched by a slur, the flat and sharp signs and the first slur :
The recognition program for symbols is the same one used for leaf evaluation later on. It works on several stages :
This process is rather fast and reliable. A 3 x 3 grid may seem very coarse, but let's remember that notations differ a lot from one sheet to another, for example the simple orientation of a treble key may differ of some 10 degrees. Much thinner tests have been attempted for some time, but they lead to no convincing result.
The way these tests are implemented allow two actions. One is the ability to manually refine a shape filter, since the tests are defined in an ASCII file. The other one is the training feature : when an item is manually assigned a shape, the related filter and grids are updated. The filter is expanded so that it can 'contain' the item signature with an additional margin. The grid is modified so that it becomes the arithmetic mean of all equivalent grids. Doing so, OptiMusic can gradually improve its recognition rate.
Another important remark is that we definitely lack of OCR processing here : Techniques of Optical Character Recognition should be used beforehand, in order to recognize and remove words that appear here and there in any musical sheet. For the time being, there is a risk that a character, or a set of characters, be recognized as a musical symbol. The only workaround so far, is to manually de-assign them, i.e. assign them to the 'Unknown' shape.
Here is the result of glyph chopping on the same example. The stems are displayed in dark blue, and the beams in lighter blue :
Stems are detected first, using the vertical LAG of every glyph. Vertical sections of a minimum height and a maximum width are considered as candidate stem chunks. Close chunks are aggregated into stem candidates, which are further checked (we need a minimum ratio of background pixels on the sides of the stem).
Once we have a set of stems in a given glyph, we look for attached leaves. This is done using a flood-fill algorithm. The leaves around a stem are flooded using a color which is unique to the current stem : doing so we can easily detect beams, since they are attached to two stems and thus already flooded with a color which is not the glyph color. Actually an additional test is needed since a shared voice (a note head attached to a stem up and a stem down) is also attached to two stems : we use the stem-relative position of the leaf to disambiguate such cases, since true beams have parallel stems rather than opposed stems.
Remarks :
The stem recognition is rather efficient and reliable, no improvement is needed here.
To illustrate the ledger removal, I've taken another portion of the example, which exhibits such ledger chunks.
Before :
After :
The processing is based on vertical LAGS. LAG sections are scanned for sub-sections which exhibit stability in height around the typical ledger height and stability in Y. Additional tests are run on the location of this chunk within the glyph itself.
Recognized ledgers are finally colorized using the 'ledger' color (generally light gray), so that the ledger parts of any given item are not taken into account in further recognition tests.
We also use the relative position of ledgers to better compute the line position of a note head : far from the staff lines, the only Y value of a leaf is not accurate enough, so the position with respect to ledger chunks provide the parity of the line position, and the final computation is thus less error prone.
Remarks :
Ledger chunks are very small items, and so their height or Y-stability is quickly impacted by presence (or lack of) pixels. Ledger recognition could be improved by more complex tests on chunk position within the glyph.
Notice that all note heads have been recognized, and also the bass clef (see below).
Actually, several phases are used :
Here is the result of such flag processing :
We use the Y-eccentricity between volume center and mass center, since the mass center is mostly impacted by the note head. Thus we know in which direction to find the the note head. We consider horizontal runs, starting at volume center and going in the direction indicated by the mass center : when we encounter a significant increase in the width of the horizontal run, we consider that we have reached the point where the note head and the flag are attached, and we split the item in two at this location. Flag-item and head-item are then evaluated separately.
A closer look at the first flag/head separation of the same example shows where the cut was made :
Remarks : None.
Here is the result of such processing on the third bar line of this system : The bar line has been correctly recognized, and the remaining leaves gathered into what is recognized as a slur.
If a glyph exhibits a stem whose shape could be a bar line, then this former stem is now considered as a self-standing item (a compound actually) and its former leaves as other self-standing items. Each former leaf is compared with the leaves of the other side of the former stem, to check for continuation : if their Y values are close enough, these 2 former leaves are merged into a single compound. New self-standing items are then evaluated.
Remarks : None.
Frame by frame, we scan the frame glyphs and compounds to come up with an X-ordered list of bars. They define the related score measures.
Note that the frame may not contain an ending bar line, in that case we 'invent' such missing bar line.
Note also that the processing is done only on the first stave of each system, and that system height in terms of staves is determined on the fly according to the bar bottom and the frame Y locations.
Remarks : The system determination assumes that at least one bar line is continuous between the first and the last stave. If not, then we miss it. A lack of continuity in the bar line may originate from musical symbols located there, or simply from missing pixels. In the second case, we could try to merge these two bar line chunks.
The only difficulty stems from leaves merged into compounds : if such a 'merged' leaf is encountered then its related compound item is used from within the group processing instead of the leave itself.
Remarks : None.
Dots can be : parts of repeating bar line, parts of a bass clef, duration modification, staccato indication. The location context helps disambiguate them.
Similarly, alteration signs can be local alterations linked to a given note head and limited to the current measure, or parts of a key signature. Location contexts, plus tests on possible key signatures, are used to differentiate between these two cases.
Remarks :
Only standard key signatures are recognized (those made of only sharp signs or only flat signs).
The slur processing is so far limited to the information recorded in the score (anchor points and radius), its related note linking is not actually handled.
First, time signatures are checked so that partially recognized ones (no numerator or no denominator) are discarded if other fully recognized ones are found in the sheet.
Then, measures durations are checked according to the current time signature, voice per voice if any. This helps to indicate if an item has been improperly recognized. Tuplets are processed on the fly, in an attempt to fix the measure duration.
Finally, beams are further harmonized within a group, since not all stems are attached to the whole set of beams in a group : generally the first and last stems are attached to all beams, but middle stem may extend only to the first beam encountered. This is done using the Y-location of the stem edge.
Remarks : None.
File updated on Friday 29-Dec-2000 16:56