Panoramascope Layer Files ========================= In order to display markers for placenames, mountain peaks etc. the Panoramascope downloads layer files from layers.panoramascope.com. These files are generated by us from OpenStreetMap data (though other data sources may be added in the future). We are happy to make these files available for you to use in other applications, subject to the same CC-BY-SA terms as the original OpenStreetMap data. (We do not require any additional attribution.) This document describes the file format. To determine what layer files are available, retrieve http://layers.panoramascope.com/available_layers.xml.gz. The format of this file should be self-explanatory; the name attribute identifies the name of a layer file. Layer files have the extension .lyr and the versions available for download are gzipped, so a name attribute of "foo" indicates the existence of http://layers.panoramascope.com/foo.lyr.gz. File Format ----------- Layer files are compact binary files that support efficient searching by geographical area and by placename. Binary data is little-endian. A file has four sections: a header, a coordinates section, a names section and an index section. The header contains fields with the following layout: uint32_t magic; // should be 0x5259414e char name[64]; // Description of the layer, null terminated uint32_t colour; // Colour used for its text (00RRGGBB) float font_size; // Relative font size (currently not used) uint32_t coords_start_offset; uint32_t coords_end_offset; uint32_t names_start_offset; uint32_t names_end_offset; uint32_t index_start_offset; uint32_t index_end_offset; uint32_t padding[7]; // For future expansion The six offset fields point to the start and end of each of the other three sections, relative to the start of the file. The sections all start on a 32-bit boundary. The coordinates section contains one 96-bit entry for each location: uint64_t coords; uint32_t name_offset; The 64-bit coords value is made up from two 32-bit values for latitude and longitude. These are fixed-point values with one sign bit, nine magnitude bits and 22 fractional bits. The two 32-bit values are then bitwise-interleaved to form the combined 64-bit value with the longitude value occupying the even-numbered bits and the latitude value occupying the odd-numbered bits. This encoding scheme results in points that are close together in two dimensions normally also being close together in the one-dimensional combined value; search for "Z curve" for the theory. The entries in the coordinates section are sorted by these 64-bit values. The name_offset field points to the location's name string in the names section, relative to the start of that section, as described next. The names section also contains one entry for each location; the length of the entries is variable. The format is: char null0; // 0 char name[]; // The name, UTF-8 encoding, variable length char null1; // 0 char data[]; // Additional data (e.g. state, altitude etc), text, // variable length char null2; // 0 uint32_t longitude; // Fixed-point longitude value uint32_t latitude; // Fixed-point latitude value The fixed-point format is the same as above, but note that in this case the values are not interleaved. These variable-length entries are packed and in general the fields will not be aligned. The entries are sorted in the same order as the corresponding entries in the coords section. The index section contains one entry for each word in the names. For the purposes of the index section the names are hypothetically stripped of accents and all punctuation characters are replaced by spaces. Each entry is a 32-bit offset into the names section, pointing to the first character of the word. These entries are sorted alphabetically by the accent-stripped words. Using the files --------------- Use mmap() to open the file. To find all locations within a geographical box: - Compute the interleaved coordinate values for the bottom-left and top-right of the box. - Use a binary search to find the range of entries in the coordinates section that lie between the corners of the box. This will include locations outside the box; it's possible to efficiently skip these but the method is complicated; refer to descriptions of "Z-curves" for details. - For each location, follow the name offset to access the name. To find all locations matching a pattern (at the start of a word): - Strip all accents and replace punctuation characters with spaces in the pattern. - Use a binary search on the index section to find the range of entries that match the pattern. - For each match, search backwards looking for a 0 byte to find the start of the name and search forward looking for a 2nd 0 byte to find the coordinates. For more information -------------------- If you plan to use these files we would be very interested to hear about your plans. Do please get in touch if you have any questions. See panoramascope.com for contact details. We do have some C++ code for accessing the files that we could make available if there is sufficient demand.