BAcK
 
:: Session 9: Day 4: Time 09:00 to 10:30
Distributed Data Management – Principles & Architecture

Malcolm Atkinson, Miron Livny and Stephen Davey

Contents:

09:00 Challenges and Constraints
Reminder of heterogeneity, bandwidth, latency and failure modes.
Sources of data and their characteristics
Scale of data and growth rates – Moore’s law will not save the day
Categories of data – artificial classification into structured and unstructured

09:30 Basic components of Data Management
Naming data and collections of data – value or variable.
Logical to physical naming.
Data storage services – combining or separating metadata.
Data movement and data placement: pre-fetching, just-in-time data pulling, caching and replication

10:00 Examples to Illustrate Engineering Challenges
Examples exploiting parallelism and higher-level abstraction/control, such as: SRM, GridFTP, Reliable File Transfer, RLS, Stork and Storm will be used to illustrate aspects of current distributed data management engineering.


Slides:
- Distributed Data Management: Principles & Architecture [ppt | pdf]
- NextGRID & OGSA Data Architectures: Example Scenarios [ppt | pdf]
- { Distributed Data Management: Principles & Architecture } mouse embryo animation [mov]


Biographies:

Professor Malcolm Atkinson PhD, FBCS, FRSE

Malcolm Atkinson is the Director of the National e-Science Centre and the e-Science Institute. He is the UK e-Science Envoy and plays a leading role in OMII-UK, and is on the advisory boards of GOSC, NCeSS, Baltic Grid and GEON. He leads training and education in the two EU-funded projects EGEE and ICEAGE project, International Collaboration to Extend and Advance Grid Education. These two projects have organised the ISSGC06. He is a member of the Global Grid Forum Steering Group and Data Area Director for GGF.

He began his career in computing in 1966. He has worked at seven universities: Glasgow, Pennsylvania, Edinburgh, UEA, Cambridge, Rangoon and Lancaster; and for two companies: Sun Microsystems (at SunLabs in California) and O2 (an Object-Oriented DB company in its early years in Versailles). He led the development of the Department of Computing Science in Glasgow and is now Professor of e-Science in the School of Informatics, University of Edinburgh. He has more than 130 publications. He has taken leading roles in national strategic research and infrastructure committees.

website!



Prof Dr Miron Livny, BSc, MSc


Miron Livny received a BSc degree in Physics and Mathematics in 1975 from the Hebrew University and MSc and PhD degrees in Computer Science from the Weizmann Institute of Science in 1978 and 1984, respectively. Since 1983 he has been on the Computer Sciences Department faculty at the University of Wisconsin-Madison, where he is currently a Professor of Computer Sciences and is leading the Condor project.
Dr. Livny's research focuses on distributed processing and data management systems and data visualization environments. His recent work includes the Condor high throughput computing system, the DEVise data visualization and exploration environment and the BMRB repository for data from NMR spectroscopy.

website!




Dr. Stephen Davey

Stephen Davey is a Software Architect at the National e-Science Centre. He is working on the NextGRID project researching the architecture for Next Generation Grids, in particular focussing on data architectures. He is also an active member of GGF, contributing to the OGSA Data Architecture and Information Dissemination (INFOD) working groups, and is the primary author of the OGSA Data Scenarios document.

After receiving a PhD in Astrophysics from Sussex University, Stephen worked in industry as a software engineer developing advanced equipment emulations for training systems. He then joined the National e-Science Centre in February 2005.