> > |
%META:TOPICINFO{author="AmyShelton" date="1161742560" format="1.0" version="1.1"}%
BOFS
FITS BOF (moderated by William Pence)
- Those interested in keeping up to date with FITS developments should subscribe to fits-bits.
- There are three new world coordinate system (WCS) papers (III, IV, and V) at various draft stages. More info on III and IV can be found at the FITS news page. FYI, Eric Greisen is one of the people working on paper V (on time coordinates).
- Those who create their own extension need to register that extension.
- The proposal for the FOREIGN extension is interesting. "It puts a FITS wrapper about an arbitrary file, allowing a file or tree of files to be wrapped up in FITS and later restored to disk... The motivation for this extension was to allow an implementation based on the FITS multi-extension mechanism to encapsulate and pass non-FITS data." See P2.17, which I have as a handout. Doug Tody is a co-author of poster.
- There was a discussion of some of the restrictions imposed by FITS, e.g. 8 character keyword names, 80 character line lengths. There are a few new keywords, which are being considered for addition to the FITS standard, that focus on these limitations. CONTINUE would allow one to create long lines.
CONTINUE is supported by cfitsio. HEIRARCH would allow one to create long keyword names. HEIRARCH is supported by PyFits. Most BOF participants were in favor of some relaxation of restrictions. Some supported more radical changes than these.
- There was also some talk of a proposed image compression propsal and the pros and cons of borrowing from industry when it comes to compression software. FYI, Doug Tody is on the aforementioned proposal.
- Someone mentioned that databases are the data format of the future and not files. Later in the conference, there was much talk about placing data directly into databases.
- Jeroen de Jong (ESO) gave a presentation on FITS and XML. He talked about describing binary FITS table headers in XML.
O1.1 Designing for Peta-Scale in the LSST Database (Invited)
- Data Products: images, catalogs, alerts. All data products are archived.
- They will provide open data access for up to 300 simultaneous users.
- Derived data and processing rates - sustained (peak):
- Processing: 105 (120) TFlops
- I/O:
- read: 60 GB/s
- write: 6 GB/s
- Bandwidth: 2.5 (10) Gb/s
- Allowed 0.1% alert publication failure (alerts are one of the data products)
- Must have automated data quality. Must be robust and quickly "fixable." Features dual-redundant systems.
- Raw data, alerts, and meta data are sent from a base station to the archive. These data are processed at both places independently to minimize network traffic. Processed information is not exchanged. Uses existing NSF-funded networks for distribution and transfer of data.
- They project that their computing requirements are within supercomputing technology trends.
- Data is stored as pixels in a database. They are using an open source database for testing. They are also benchmarking two commercial databases - SQLServer and DB2
- metadata - 675 million rows
- source - 250 billion rows
- object - 22 billion rows
- David Fleming is working for the NCSA on the problem of handling these large datasets.
- Plan on being VO-compliant.
- Pipeline components: image processing -> detection -> association <-> moving object
- They will be using a distributed file system. They are currently investigating existing implementations, e.g. Google.
- They will incorporate three tiers of storage: fast disk, slow disk and tape w/ cache.
- Google has petitioned to become a member of the LSST. Microsoft is already a member.
- Believe it or not, the SKA data needs are much grander.
|