Tier1A Status
Andrew Sansum
30 January 2003
Overview
•
Systems
•
Staff
•
Projects
Lots of Services
DISK FARM
CPU FARM
CDF
Babar Suns
TESTBEDS
Core Services
AFS
Datastore
Support
Systems
Lots of Operating Systems
•
Production Farm
–
Redhat 6.2 (Close to end of life)
–
Redhat 7.2 (In production/ Babar)
–
Redhat 7.3 (close to Trial Service: For LHC)
•
CDF Service
–
Redhat 7.1 (Kerberised Fermi Distribution)
–
Redhat 7.3 (Possible Future release)
•
Solaris Service
–
Solaris 2.6/Solaris 8
•
EDG Testbed(s) - Redhat 6.2 -> Redhat 7.3
Lots of EDG Testbeds!
•
Production Testbed (CE, SE, 3*WN+NM)
•
Development Testbed (CE, SE, 1*WN)
•
RGMA Testbed (CE, SE, WN and RB)
•
WP5 SE
•
WP3/WP5 development systems
•
EDG UI
•
CE for REDhat 7.2 service
Lots of Grid Testbeds!
Tier1A
Babar
New Hardware
•
Disk
–
Expect 40TB
–
Continue with existing IDE technology, but
different manufacturer.
•
CPU
–
Expect 100 CPUs
–
Move to Pentium 4 or possible AMD
Some New Staff
GridPP Staff: Traylen, Radden, Bly
ESC/PPD System Staff: Wheeler, White, Sansum,
Saunders, Ross, Folkes, Strong
Management: Kelsey, Gordon, Sansum, ...
BITD Support: Networking, Operations, User Reg, AFS
Experiment Support Staff (RAL and elsewhere)
Users
Lots of New Projects
•
Basic fabric performance monitoring (ganglia)
•
Resource CPU accounting (based on PBS
accounts/mysql)
•
New CA in production
•
New batch scheduler (MAUI)
•
Deploy new helpdesk (end March)
•
Network Performance tests (CERN/Bristol -
also maybe WP7)
•
Get ready for LCG (February deployment?)
Ganglia Monitoring
•
Urgently needed live performance and
utilisation monitoring
–
RAL Ganglia Monitoring (live)
–
RAL Ganglia Monitoring (Static)
•
Scalable solution based on multicast
•
Very rapidly deployable - reasonable support
on all Tier1A Hardware
•
See:
http://ganglia.sourceforge.net/
New CA Deployed
•
Now fully deployed by E-Science Centre
(Jens+Alastair Mills)
•
In use in UK core GRID
•
Several PP have RA’s defined
•
Approved by EDG - not yet in
distribution.
•
Once in EDG - termination date for old
CA will be set.
New Scheduler (MAUI)
•
With Redhat 7.2 now using MAUI
Scheduler over PBS
•
Some problems with MAUI scheduling
on wallclock time - now corrected.
•
Testing algorithms, but essentially have
a range of strategies we can apply.
•
Will make changes to queue structure in
due course
New Helpdesk Software
•
Old helpdesk (Remedy) - mail based,
unfriendly.
•
With additional staff, urgently need to deploy
new solution.
•
Expect new system to be based on free
software (Bugzilla, Request Tracker …)
•
Hope that deployed system will also meet
needs of Testbed and Tier 2 sites.
•
Expect deployment by end of March.
Network Performance Tests
•
Simon Metson, Nick White, +….
•
Preparing for CMS production. Must be
able to move data to CERN at 100-
200Mbit/second.
•
Currently aggregate 350Mbit/s to Bristol
- but under 100Mbit/s to CERN.
•
Main problem seems to be within CMS
infrastructure
BaBar Batch CPU Use at RAL
MOU
Successes (2002)
•
Five additional staff online since January
2002.
•
Fully engaged in EDG testbed. Making an
impact in EDG: Steve
•
Tier1A installation went very well in
March/April/May
•
Tier A service ramp up excellent:
–
Most successful of the Tier A services. SLAC
seem pleased - so far.
Challenges
•
Complete 2002/2003 tender/deployment
•
Carry out major EU tenders for 2003/2004
•
Expand use of Tier 1
•
Need to evolve strategy to cope with diversity
of requirements
•
Deploy the LCG Testbed (What/When?)
•
Enhance automation / out of hours cover
•
Improve reporting to GridPP - accountability