Electronic yellow-page services: The Star*s Family as an example of diversified publishing

André Heck

Observatoire Astronomique
11, rue de l'Université
F-67000 Strasbourg
France

Abstract

The broad currently accepted definition of electronic publishing encompasses also yellow-page services on the web. Together with their equivalent on paper, they are an example of diversified publishing. Some of these include validation and authentication steps which are sine qua non requirements for a service worthy of its name. We briefly describe here such a service with its outstanding features and procedures. We also discuss the maintenance processes and illustrate how constraints at the level of the distribution can downgrade an otherwise rich compilation of information.

Introduction

Putting a document on the WWW or setting up an information resource on the web is now considered as an act of `electronic publishing' (see e.g. Heck 1997 and references therein). This is therefore the case for yellow-page services now accessible via the networks. When the corresponding information is also available on other media (such as paper), we can then speak of `flexible publishing' or, as we prefer it, of `diversified publishing'.

In the following, we describe and comment an example of yellow-page services, the Star*s Family, that involves authentication and validation steps. We shall also discuss the maintenance of such a resource and indicate that the information available in the master files and on paper is actually richer than that provided electronically, with potentialities not (yet) fully exploited on the web.

The StarPages cover the products of the Star*s Family that are available through the WWW server of the Centre de Données astronomiques de Strasbourg (CDS) (see e.g. Egret & Genova 1997). The Star*s Family itself is a growing collection of directories, dictionaries, databases and related products (Heck 1994). Nobody today would question the usefulness of telephone books nor that of remotely accessible databases. The Star*s Family combines their advantages by offering resources of detailed information, both on paper and on-line, about astronomical organizations, as well as on services of general and practical utility for astronomers and related scientists, and on these persons themselves.

The outstanding features include:

The basic philosophy of these directories and databases is to provide practical data which one seeks always to have at one's disposal. They have proved over the years to be extremely valuable auxiliaries.

Presentation of the resources

StarGuides and StarWorlds The compilation of data on astronomy, space sciences and related organisations was initiated by us at the end of the seventies with the publication of directories of astronomical associations and societies, as well as of professional institutions. The two lists were merged later into a single work called StarGuides (Heck 1993a). From the start, the geographical coverage was world-wide and the fields covered were progressively broadened to include all organizations that could be of interest, to any degree, for astronomers and related scientists.

StarGuides' master files gather together all practical data available on associations, societies, scientific committees, agencies, companies, institutions, universities, etc., or more generally organizations, involved in astronomy and related sciences. Many other types of entries have also been included such as academies, bibliographical services, data centres, dealers, distributors, funding organizations, IAU-adhering organizations, journals, manufacturers, meteorological services, national norms and standards institutes, parent associations and societies, publishers, software producers and distributors, and so on.

Currently about 6000 entries from about 100 countries have been selected. For each entry, all practical data available are listed. Refer to the bibliography and the on-line documentation for details.

StarGuides' master files have been made accessible as an on-line database called StarWays by the European Space Information System (ESIS) group (see e.g. Heck et al. 1992) and as an independent database called StarGates (Albrecht & Heck 1994b) at the European Southern Observatory (ESO).

Since January 1994, the StarGuides files can be queried through the WWW server of the Centre de Données astronomiques de Strasbourg (CDS) as the database StarWorlds (Heck et al., 1994).

Refer to the on-line documentation for tips on how flexible queries can be performed, often calling for subliminal (hidden) information such as synomyms (such as names of main cities in various languages, and so on) or for categories of entries corresponding to the thematic subindices of the paper versions. When retrieved, active URLs allow straight navigation towards the corresponding organizations. StarHeads As a complement to the previous resource and with the development of the WWW, we started compiling a database of URLs of personal pages of individual astronomers and related scientists. That resource, called StarHeads (Heck 1995), has also been made operational in January 1994 on the CDS WWW server.

At the time of writing, more than 2500 personal pages are accessible, but that figure is increasing weekly as the resource is extremely successful, frequently visited and pointed at by services such as NASA's ADS (see e.g. Eichhorn 1997).

Refer to the on-line documentation for querying procedures and additional details. StarBriefs and StarBits The dictionary StarBriefs (Heck 1993b) gathers together currently about 110,000 abbreviations, acronyms, contractions and symbols. Many entries in common use and/or of general interest have also been included when appropriate.

The underlying idea is to offer to astronomers and related scientists a practical assistant in decoding the numerous abbreviations, acronyms, contractions and symbols that they might encounter in their professional activities. Maybe a bit paradoxically, if scientists can quickly grasp the meaning of an acronym purely in their specific field, they will probably have more difficulties with adjacent fields. It is actually for this purpose that this dictionary might be more often used. Scientists might also use this compilation to avoid assigning an acronym that already has too many or confusing meanings.

This compilation is essentially carried out in parallel with the permanent updating of StarGuides/StarWorlds. In practice, all major abbreviations and acronyms encountered when scanning the general literature and the documentation received in relation with these are gathered, the underlying principle being that they might also appear one day under the eyes of astronomers and related scientists.

The dictionary StarBriefs has also been made accessible as an on-line database at ESO under the label StarWords (Albrecht & Heck 1994a). Since January 1994, it is reachable on the CDS WWW server as the database StarBits (Heck et al. 1994).

Refer also to the on-line documentation for querying procedures and additional details.

Maintenance and Quality

The Star*s Family compilations have taken advantage of the experience gained over the years, especially in the development of techniques for collecting, verifying and treating the data [cf. general working scheme]. To compile a directory or a database of real value is indeed quite a different venture compared to barely reproducing and distributing, with comments of greater or lesser interest, data collected indiscriminately from all available sources. The latter criticism also applies to information gathered from on-line forms which enter the information into on-line resources without much further processing nor appropriate perusal.

Basic checks, homogenization, validation of the substance itself as well as of its (sometimes electronic) originators, and so on, are fundamental processes for a reliable product of quality. Moreover, while professional file and database construction techniques are necessary, they cannot save the extensive background, unrewarding and very careful work which is indispensable for the compilation of a valuable resource.

The definition of a very well-profiled and adapted questionnaire, the homogenization of the data collected and the maximum reduction of the respondents' biases are all points that must be satisfied, often with the help of the most modern communication means. The continuous political evolution of the world has also to be taken into account. If the information is provided in the Star*s Family files bona fide, the best effort is nonetheless made to keep track of the modifications happening and to implement them as soon as they are confirmed or recognized by the international community. On a more pragmatic note, the frequent changes occurring in phone/fax numbering in countries round the world can only be echoed in the resources through efficient collaborations with the corresponding telecommunications companies. The same applies to organizations linked to the World Meteorological Organization and the International Organization for Standardization.

One can never stress enough the importance of this obscure daily work, consisting of patiently collecting data, checking and re-checking information, and continually updating the master files. If scientists have a natural tendency to design projects and software packages involving the most advanced techniques and tools, there is in general less enthusiasm for the painstaking and meticulous long-term maintenance which builds up the real substance of the databases. This has also to be carried out by knowledgeable scientists or documentalists and cannot be delegated to inexperienced clerks.

The fashion is now shifting towards designing and testing quality control processes, but we believe that the best quality assurance (accuracy, homogeneity, exhaustivity, ...) has to be achieved when collecting and entering the data themselves with an immediate check of the entered material. None of the algorithms currently available has really convinced us of its absolute necessity and satisfactory utility. Again here, developing such processes is an appealing challenge for scientists, but most of the algorithms designed work statistically. For a database user, it does not matter much whether it is accurate up to 90% or 95%. The user wants to find the piece of information he/she is looking for, and, if found, this has to be accurate. All these considerations are obvious if a phonebook is taken as a model for yellow-page services.

It is interesting to notice how questionnaire respondents are sometimes unable to fill in properly these forms about their own organizations. Some astronomers seem to still ignore the differences between minutes and seconds of, respectively, degrees and hours (a factor fifteen between these units), which - if left uncorrected - would of course offset seriously the location of the corresponding observing places! Our experience is that it is absolutely necessary to cross-check every bit of information. We therefore request as much documentation as possible (activity reports, periodicals published, and so on) to be posted to us.

Over the twenty years or so we have been dealing with this kind of activity, we have gone through quite a few cases of ghosts associations, of non-existing groups (or groups without legal existence) within universities, of fights between leading organizations about their actual size, activities, representative status, and so on. In order to release accurate information, we have then to run independent checks, call or e-mail trusted informants and request third parties to report on the actual situation. As human aspects are always involved, diplomacy is the rule, but sometimes people have to be told bluntly to behave. In some instances, entries had to be withdrawn from the databases.

A kind of intuition (maybe this is real `experience') has been developed over the years for detecting such cases (not very frequent fortunately, but not rare either) and the most common action is often to tune down some exaggerated figures or scope of activities, etc.

The so-called `grey' literature has been more difficult to detect with the advent of desktop publishing packages and high-quality laser printers. The WWW and the possibility for each individual to set up impressive pages makes it even more difficult to assess exactly what is behind some of these [3]. This is why, until possible further technological developments come about, we always request an independent - and properly documented - authentication of the organization and of its representative by `snail mail'. Of course, WWW pages that cannot be accessed, that are empty, or that contain too much unrelated material are ignored.

Information compiled and information distributed

On the other end of the process, it is also important to monitor the usage of resources (how in fact these are queried or tackled) and possibly to adapt the release and display of the retrieved information in a more suitable manner for the average user (with some restrictions and security measures of course [4]).

Maybe somehow surprizingly, our experience with the various on-line distributors of the information we have compiled is that this is never made totally available nor fully exploited, for either technical or human reasons [5].

Thus, in the original master files, about a hundred special characters (accentuated or others) have been encoded and are properly printed via TeX in the paper versions of the directories and dictionaries. Only a small part of these are actually displayed as such on line. All the others are transliterated in basic ASCII.

As indicated earlier, active URLs allow easy navigation towards the WWW servers of the organizations whose entries are retrieved. Depending from the technical possibilities of the visiting sites, plug-in automatic phone or fax numbering could take advantage of the standard corresponding format.

More flexible queries can be developed, possibly linked to mapping facilities very useful for planning observational campaigns, for instance. The format of the coordinates for geographic locations has been developed for making feasible the retrieval of, for instance, sites within a specific area.

Developing multilingual interfaces is also feasible as a language flag is embedded in each entry.

There are thus ample possibilities for other distributors to develop varied services from the same set of master files, including also their cross-linking. From the reverse point of view, there is also the possibility for resources such as StarWorlds and StarBits to be pointed at - similarly to how ADS is pointing to StarHeads.

A Few Last Comments

The profile of the directories and the databases, as well as the questionnaires sent to the various organizations listed have been improved and adapted over the years. The information gathered has been more and more comprehensive. The categories of the entries listed have also been gradually broadened to better serve the needs of astronomers and related scientists. When compiling files such as the Star*s Family master ones, one cannot but be impressed by the very broad spectrum of disciplines to which astronomy and related sciences are linked, and by the very large variety of techniques applied in these fields.

The successive releases of the directories and databases give fairly accurate global pictures of the active organizations in the fields covered. Their sequence testifies to the sometimes rapid evolution of scientific interests, of data collecting and handling techniques, as well as of communications in the broad sense. A few countries have also rearranged the structure of their national facilities in the course of the past years.

Among the most striking recent changes, electronic mail and the WWW have dramatically modified the way scientists communicate and exchange information.

Of course, the political evolution in the world is directly reflected in the information provided. The USSR, the German Democratic Republic and Czechoslovakia have disappeared. New countries have lengthened StarGuides' table of contents. The liberalization of political regimes, especially in Eastern Europe, has resulted in a dramatic increase of the questionnaires returned from the regions concerned. Most of the African continent remains however a dramatic gap and this should be a concern for each of us.

Technical evolutions have also been playing a significant rôle in recent years and we went through them while producing the successive versions of the directories and setting up the databases. Desktop and electronic publishing, with all the indexing facilities, have become instrumental for providing monthly updated releases of the Star*s Family products on paper with an outstanding quality essentially due to the TeX typesetting system.

As to the databases, not only flexible management systems allow efficient information retrieval, but the current omnipresence of WWW browsers have made their access much more popular. Specific problems are not absent though, such as the (in)stability of sites, URLs and pages. This might be due however to an unavoidable running-in phase.

Acknowledgements

Special thanks are directed to all persons who have assisted us over the years in the materialization of the Star*s Family products at all levels. Most of their names appear in the bibliography and in the quoted papers. A special mention is due here to Daniel Egret and François Ochsenbein for making the StarPages accessible on the CDS WWW server and improving the service whenever they can.

We are also very grateful to all persons and organizations who contribute to the very substance of the Star*s Family products by returning the questionnaires, by providing the relevant documentation, by participating in the various procedures of maintenance, validation and verification of the information, or otherwise. The Star*s Family products have been conceived for them and for the vast community of users. We are looking forward to satisfying their needs in continually better ways.

The implementations as databases of the Star*s Family products by the European Space Agency, the European Southern Observatory and Strasbourg astronomical Data Centre have been strong incentives to continue and always improve these time-consuming compilations.

References

Notes

  1. Contrary to most on-line resources, the Star*s Family products are not only e-mail or WWW-oriented.
  2. In other words, they are not (as in some on-line acronym servers) built up automatically from dictionaries.
  3. Remember the famous cartoon featuring two dogs keying in e-mail messages: On the Internet, nobody knows you are a dog!
  4. Nobody really wants his/her thousands hours of compiling work to be unfairly copied by a couple of mouse clicks.
  5. In fact, it is a specific full-time job to compile information and maintain its high-quality level, distinct from making it available on the networks.

Go back to the table of contents of the book
Electronic Publishing for Physics and Astronomy
Books main page.
Publications main page.
© Copyright André HECK, current year.