:: Session 9: Day 4: Time 09:00 to
10:30
Distributed Data Management – Principles & Architecture
Malcolm Atkinson, Miron Livny and Stephen Davey
Contents:
09:00 Challenges and Constraints
Reminder of heterogeneity, bandwidth, latency and failure modes.
Sources of data and their characteristics
Scale of data and growth rates – Moore’s law will
not save the day
Categories of data – artificial classification into structured
and unstructured
09:30 Basic components of Data Management
Naming data and collections of data – value or variable.
Logical to physical naming.
Data storage services – combining or separating metadata.
Data movement and data placement: pre-fetching, just-in-time data
pulling, caching and replication
10:00 Examples to Illustrate Engineering Challenges
Examples exploiting parallelism and higher-level abstraction/control,
such as: SRM, GridFTP, Reliable File Transfer, RLS, Stork and
Storm will be used to illustrate aspects of current distributed
data management engineering.
Slides:
- Distributed Data Management: Principles & Architecture [
ppt |
pdf]
- NextGRID & OGSA Data Architectures: Example Scenarios [
ppt |
pdf]
- { Distributed Data Management: Principles & Architecture } mouse embryo animation [
mov]
Biographies:
Professor Malcolm Atkinson PhD, FBCS, FRSE
Malcolm Atkinson is the Director of the National e-Science
Centre and the e-Science Institute. He is the UK e-Science Envoy
and plays a leading role in OMII-UK, and is on the advisory
boards of GOSC, NCeSS, Baltic Grid and GEON. He leads training
and education in the two EU-funded projects EGEE and ICEAGE
project, International Collaboration to Extend and Advance Grid
Education. These two projects have organised the ISSGC06. He
is a member of the Global Grid Forum Steering Group and Data
Area Director for GGF.
He began his career in computing in 1966. He has worked at
seven universities: Glasgow, Pennsylvania, Edinburgh, UEA, Cambridge,
Rangoon and Lancaster; and for two companies: Sun Microsystems
(at SunLabs in California) and O2 (an Object-Oriented DB company
in its early years in Versailles). He led the development of
the Department of Computing Science in Glasgow and is now Professor
of e-Science in the School of Informatics, University of Edinburgh.
He has more than 130 publications. He has taken leading roles
in national strategic research and infrastructure committees.
website!
Prof Dr Miron Livny, BSc, MSc
Miron Livny received a BSc degree in Physics and Mathematics
in 1975 from the Hebrew University and MSc and PhD degrees in
Computer Science from the Weizmann Institute of Science in 1978
and 1984, respectively. Since 1983 he has been on the Computer
Sciences Department faculty at the University of Wisconsin-Madison,
where he is currently a Professor of Computer Sciences and is
leading the Condor project.
Dr. Livny's research focuses on distributed processing and data
management systems and data visualization environments. His
recent work includes the Condor high throughput computing system,
the DEVise data visualization and exploration environment and
the BMRB repository for data from NMR spectroscopy.
website!
Dr. Stephen Davey
Stephen Davey is a Software Architect at the National e-Science
Centre. He is working on the NextGRID project researching the
architecture for Next Generation Grids, in particular focussing
on data architectures. He is also an active member of GGF, contributing
to the OGSA Data Architecture and Information Dissemination
(INFOD) working groups, and is the primary author of the OGSA
Data Scenarios document.
After receiving a PhD in Astrophysics from Sussex University,
Stephen worked in industry as a software engineer developing
advanced equipment emulations for training systems. He then
joined the National e-Science Centre in February 2005.