Semantically Resolving Type Mismatches in Scientific Workflows
Derouiche Kheiredine
kd05r@ecs.soton.ac.uk
School of Electronics and Computer Science
University of Southampton
November 2007
Large logo-small.tif
Scientific Workflows
Scientific workflows describe structured activitiesarising in scientific problem-solving.
Conducting experiments involve complex andstructured computations.
Semantic mismatches among resources involve muchhuman intervention.
Participating services are owned by differentorganizations, defining compensations is critical to asuccessful recovery from failures.
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Workflows in Bioinformatics
Integrating different tools to solve biological problems
Usually involves:
Manual data transfer between applications
Understanding data formats
Converting file formats where appropriate
Manual workflows involve a large number of steps.Manual execution is time-consuming and error-prone
User is required to possess a deep knowledge andunderstanding of disparate application environments
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
WSWUBlast Service Operation
WSWUBlast Service Operation
Using tools in Bioinformatics
blastn
(query, database, email)
blastn
(query, database, email)
BlastService
BlastService
Specification of an in silico experimental design: Sequence Similarity Search
Task
Service Class
Specific Services
DNASequenceSimilaritySearch
DNASequenceSimilaritySearch
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Automated Workflows
Make the task of creating a workflow a simple “dragand drop” process
Make the resulting workflow diagram selfdocumenting, showing exactly how to performbioinformatics experiment
Automatic execution of steps specified in workflow
Monitoring workflow execution to help debugging andintervention
Reduces complexity for scientific users, as well assupport sharing and allow repeatability
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Bioinformatics Workflow Systems
Specialized workflow systems designed to develop workflows inbioinformatics
Different workflow standards and systems:
BPEL:  Business workflow standard adapted for scientific workflows
UNICORE:  a Grid middleware, it provides a GUI for workflowdevcelopment
Globus: an open source toolkit implementing many Grid relatedstandards
Kepler: graph based modelling language to develop workflows
Taverna Workbench: choreography tool for bioinformatics WebServices
Triana: develop component based workflow and provide couplingwith Grid middleware tools
Windows Workflow Foundation (1)
Part of .NET Framework 3.0
Workflows are a collection of activities.
Components
Base Activity Library: Out-of-box activities and base forcustom activities.
Runtime Engine: Workflow execution and statemanagement.
Runtime Services: e.g. RDBMS, persistence, transactions
Visual Designer: Graphical and code-based constructionin Visual Studio or standalone
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Windows Workflow Foundation (2)
fig03_L.gif
toolbox.png
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Semantic Web Services
The augmentation of Web service descriptions withSemantic annotations.
Aims to automate Web service discovery, composition,invocation, and monitoring.
Two different approaches:
Revolutionary: OWL-S, and WSMO.
Evolutionary: WSDL-S, and SAWSDL.
The SAWSDL approach builds on existing Web servicestandards and is agnostic to ontology representation.
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Semantic Annotations for WebServices Description Language
SAWDL is an extension of WSDL using theextensibility elements.
Two basic types of annotations:
Model reference, associates selected WSDL componentswith Semantic concepts.
Schema mapping, deals with data heterogeneity bytransforming one data representation into another.
Annotations for WSDL 1.1 and WSDL 2.0.
API and tool support including: SWASDL4J,Woden4SAWSDL, Radiant...
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
SAWSDL Scope
definition
import
types
message
part
portType
operation
input
output
fault
binding
operation
input
output
fault
service
port
Annotated usingmodelReference
Annotated usingmodelReference withschemaMapping
Note:
- All elements may have<documentation> as first child
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
SAWSDL Example
<wsdl:definitions targetNamespace="http://www.w3.org/2002/ws/sawsdl/spec/wsdl/order#"
   xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"
   xmlns:xs="http://www.w3.org/2001/XMLSchema"
   xmlns:sawsdl="http://www.w3.org/ns/sawsdl">
   <wsdl:types>
       <xs:element name=“purchaseOrderResponse“ type=“xs:string”
sawsdl:modelReference="http://www.w3.org/2002/ws/sawsdl/spec/ontology/purchaseorder#PurchaseOrderResponse"
sawsdl:liftingSchemaMapping="http://www.w3.org/2002/ws/sawsdl/spec/mapping/Response2Ont.xslt">
        ……
       </xs:element>
   </wsdl:types>
   <wsdl:portType name=“PurchaseOrder">
     <wsdl:operation name="order">
        <sawsdl:attrExtensionssawsdl:modelReference="http://www.w3.org/2002/ws/sawsdl/spec/ontology/purchaseorder#RequestPurchaseOrder"/>
       <wsdl:input
           messageLabel="OrderRequestMessage"
           element=“purchaseOrderRequest"/>
       <wsdl:output
           messageLabel="OrderResponseMessage"
           element=“tns:purchaseOrderResponse"/>
     </wsdl:operation>
   </wsdl:portType>
</wsdl:definitions>
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Leveraging existing Java for .NET
A C# implementation of the SAWSDL specification.
Support of Model Reference annotations, OWL/RDFdefinitions.
Lifting/Lowering schema support, XSLT/SPARQLmapping definitions.
Allows the creation of SAWSDL based applications.
Extends the .NET API for WSDL1.1.
Support for WSDL2.0 through XSLT.
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Implementation
Development of a custom activity that extends thebase Web Service activity shipped with WF.
Enables a semi-automatic composition of SemanticWeb Services, and the execution of the workflow.
Can be composed with Web Services described usingWSDL files.
A C# implementation of the activity.
Semantic capabilities are provided by the Jena library,integration with C# is enabled via IKVM.
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Semantic Reasoning
Model reference annotations describe thefunctionalities a Web service provides.
Use ontologies as semantic models for the semanticannotations.
Reasoning capabilities are provided by using:
Jena, an open source Semantic Web framework for Java.
Pellet, an open source Java OWL-DL reasoner.
Currently support schema type and message partannotations to achieve automatic parameter binding.
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Schema Type Mapping
Provide mappings between XML and semantic models.
Lifting Schema Mapping specifies mapping betweenWSDL Type Definitions in XML and semantic data.
Used XSTL and XQuery as mapping languages.
Lowering Schema Mapping specifies mapping betweensemantic data and WSDL Type Definitions in XML.
Used SPARQL to query ontology, followed by XSTL andXQuery.
Semantic data is queried through SPARQL, it is supportedby Jena through its query engine.
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
IKVM .NET
An implementation of Java for the Microsoft .NETFramework.
It includes the following components:
A Java Virtual Machine implemented in .NET.
A .NET implementation of the Java class libraries.
Tools that enable Java and .NET interoperability.
Used to compile Jena and Pellet JAR libraries into .NETDLL assemblies, Java bytecode is translated toCommon Intermediate Language (CIL).
Allowed using Jena’s capabilities in theimplementation of the Semantic Web service activity.
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Semantic Web Service Activity (1)
Activity bindings are the key feature that enablesproperty binding between activities, or on theworkflow itself.
This mechanism allows data propagation betweencomposed activities.
WF rely on syntactic approaches when bindingproperties between activities.
The SWS activity implements a basic semanticmatching engine to better support semanticallycompatible properties.
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Semantic Web Service Activity (2)
Automatically bind SWS parameters to composedworkflow activities using the semantic approach.
The semantic model annotation of an activity’s inputhas to be equivalent or a subclass of the composedactivity’s output one.
Values are mapped to the appropriate datarepresentation at design time.
Missing activity bindings can be manually added usingthe WF visual designer.
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
Bioinformatics Workflow Example
X14298
kd05r@ecs.soton.ac.uk
embl
GetEntry
getFASTA_DDBJEntry
WSWUBlast
M7WEXBN7013
Atgagtgatggagcagttcaaccagacggtggtcaacctgctgtcagaaatgaaagagctcaggatctgggaacgggtctggaggcggg
blastn
accession
sequence
database
email
jobID
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif
WSWUBlast
WSWUBlast
Automatic Binding inBioinformatics Workflow
blastn
blastn
SemanticReasoner
SemanticReasoner
Output
Output
Input
Input
SAWSDL
SAWSDL
Semantic Concept
Semantic Concept
Degrees of Match:
-Exact
-Subclass
Degrees of Match:
-Exact
-Subclass
Bind Parameters,Carry out necessarytranslations
Bind Parameters,Carry out necessarytranslations
Sequence
Sequence
Sequence
Sequence
DNA Sequence
DNA Sequence
DNA Sequence
DNA Sequence
GetEntry
GetEntry
getFASTA_DDBJEntry
getFASTA_DDBJEntry
SAWSDL
SAWSDL
Conclusion & Future Work
API implementations that enable the development ofsemantically annotated Web services.
Semantic Web service activity integration to WF,facilitating workflow building and manipulation.
Future Work:
Improve the SWS activity by processing more SAWSDLannotations, e.g. operation and portType.
Semantically annotate Bioinformatics Web services, then useWF to build a workflow composed of SWS activities in orderto test the implementation.
Implement an approach to semantically guide and verifycompensations and exceptions.
Semantically Resolving Type Mismatches in Scientific Workflows
Large logo-small.tif