GSoC Logo BioJS application 2016

BioJS in GSoC 2016

We are pleased to announce that BioJS will be a mentoring organization in this Google Summer of Code 2016. We are happy to mentor our 5 ambitious students from all around the world again on exciting projects to create beautiful, biological data visualizations. We are looking forward to our students' work involving modern web technologies relevant for both academic work and the support of our vivid open-source community!
Want to learn more about Google Summer of Code? Visit the project website or read the student's manual!

Registrations Closed!

Even if you haven't been accepted this year, you can still contribute to BioJS besides of GSoC.
Ask us on Slack Chat to get started.


Our project ideas for 2016

  • schoolVisualisation of standard types in life science with BioSchemas, schema.org and BioJS

    Rationale

    At the moment there are no standard ways to describe life science events, people, training materials and courses. The websites holding this information code it in different ways. As a result, the dissemination, discovery and aggregation of this information is challenging, and advertising in third party websites normally requires time-consuming manual curation. This hampers the flow of information around the life science community.

    The project aims to propose standard ways to present life science information. This will be achieved using schema.org markup, extended where necessary with new properties and guidelines on how to use new and existing schema.org properties. The project will also create a prototype that parses HTML pages annotated with terms coined within the agreed standards. Such a prototype will be done in using BioJS, a JavaScript based component library. If time allows, a visualization of the parsed data would be the next step.

    Approach

    Work has already been done on creating coding standards for life science information. Several life science organisations across Europe have joined together and formed Bioschemas, which is still in early stages. This project will engage with the schema.org community and with Bioschemas to carry the standards forward.

    The standards themselves will be based as much as possible on existing agreements that haven’t yet been put into practice. There will be a standard for coding each type of life science information (e.g. an event, person or organisation). Each standard will consist of:

    • a data model based on an existing schema.org type (e.g. Person, Event), or a proposed extension of an existing type (e.g. LifeSciencePerson). Existing types will be preferred.
    • the data model may contain additional properties for each type. These properties will be submitted to the schema.org community to be adopted as part of the standard schema.org specifications.
    • controlled vocabularies, using existing ontologies wherever possible.
    • cardinality (i.e. whether one or many values are expected for each property).
    • minimum fields

    The specifications will be designed to be unintrusive to information providers, minimising changes to the methods organisations currently use to publish life science information. The dissemination of information will be facilitated by making use of standards like Microdata, JSON-LD and RDFa.
    JavaScript visualizations will provide specific views for life science schema.org types and facilitate content integration in third party websites. It will also help both in creating and implementing the standards. Example use cases include:

    • Using bubble diagrams to find common properties across all of the information types (e.g. people, events), and to identify properties common to them all.
    • A map of training course locations and people's expertise to help identify training needs.
    • A map life science events, that you can filter by topic.
    • A diagram showing where life science information is generated and where it propagates to (which websites). This will identify ways bioschemas can enrich this network, or show how bioschemas helps. If the websites are in agreement, this diagram could be quantified with page views.

    Challenges

    Agreeing the appropriate use of schema.org properties and types with the schema.org community (the use of some properties and types are open to interpretation).
    Reaching agreement with the life science community on which properties are needed to describe different types of life science information.
    Encouraging the adoption of the new coding standards in some key life science websites.

    Involved Tools/Libraries

    Comfortable with HTML
    Familiar with schema.org markup
    JavaScript

    Mentors

    Martin Cook (ELIXIR), Rafael Jimenez (ELIXIR), Leyla García (Uniprot-EBI)

  • schoolBonestagram - DICOM visualization made fun

    Rationale

    While the cost of production of medical images such as Computed Tomographies, Magnetic Resonances and Ultrasound is decreasing, the number of studies that deal with these images is steadily increasing. These medical images are important especially for diagnosis of bone or internal tissue injuries. So, this makes it possible that many individuals have multiple scans of different parts of their bodies, which they can take home (Figure 1).

    Fig 1. Examples of two medical images

    The problem we want to tackle is: a vast majority of individuals do not collect their medical images produced for diagnosis, however it has an important advantage to do so. For example, in case of a new injury, historic data about previous existing conditions can help physicians in their treatment plan.

    To encourage individuals to collect their medical images, we want to create a visualization app that is easy to use and through an interaction with the user’s web cam can visualize the medical image together with the injured part of the body (e.g. it can lay image of a bone over user’s arm or an image of the brain over user’s head). By moving the body part, the overlaid medical image should also be moved. Finally, the video or a snapshot (picture) produced by the app can for example be shared with family and friends (Figure 2).

    Fig 2. Gamification - overlaying a medical image over the image of a user through the web cam

    The gamification, i.e. allowing users to overlay the images or their renderings to webcam images or selfies, can for example be done similarly to Face Swap apps.

    Approach

    The first part of this project is represented by data collection. Different DICOM images (modalities and object of interest) need to be obtained in order to understand the diversity and nature of the data.
    The second part of the project will be represented by the formulation of an intuitive, fast DICOM viewer that can be run in any browser.
    The third stage of the project is represented by adding gamifying features to the viewer that encourage the user to use the software and share his/her renderings e.g. in the social context.

    Challenges

    • Privacy
    • Heterogeneous data
    • Complete JavaScript frontend, platform-independent solution
    • Simple, non-expert UI design for highly non-trivial data

    Involved toolkits or projects

    Sample Data sources:

    Open Source related projects:

    Needed skills

    Fluency in JavaScript and Git

    • Knowledge in Computer Vision
    • Knowledge in best practices in UI/UX
    • Knowledge about medical images is beneficial but not required
    • Very high motivation and good independent worker
    Mentors

    Christian Dallago (TU Munich, RostLab), Hesam Rabeti (TU Munich), Guy Yachdav (BioSof)

  • schoolVisualisation of target-disease relationships in drug discovery

    Rationale

    The Centre for Therapeutic Target Validation (CTTV) is an innovative public-private partnership whose aim is to support researchers in identifying possible drug targets more efficiently by integrating genome-wide biological data from several public databases. It is a collaboration between The Wellcome Trust Sanger Institute (WTSI), The European Bioinformatics Institute (EBI), Glaxo SmithKline and Biogen and has the commitment to make public all the data generated and the software developed in the project.

    The first public version of the web platform was released in December 2015 and is currently being used widely. As part of this initiative, the CTTV team has developed several re-usable visualisations to allow easier interpretation of the integrated data and these have been made public through the BioJS registry.

    One of the areas we would like to improve in the next versions of the platform is help in creating new hypotheses (ie, target-disease associations) based on known associations. These new hypotheses can be inferred from different biological data such as pathways, known drugs or co-occurrences in scientific literature.

    In this project we propose the creation of a visualisation that can be used to easily explore these new hypotheses.

    Approach

    The CTTV project makes use of UX methodologies to define the user requirements and ensure that the features included in the platform meets the expectations of the prospective users. Thus, interaction with the UX team and becoming familiar with UX techniques will be beneficial to the GSoC student.

    To develop this visualisation the GSoC student will have to get comfortable with the data schema used at CTTV and work with our back-end team to get the data in the most convenient way to visualise it. Depending on the skills and interests of the student there is an opportunity to get involved in processing the data and its presentation through our REST API.

    The visualisation will be developed as a re-usable web-based component, be integrated in our AngularJS web platform and made public in BioJS.

    A possible visualisation may look as follows:

    This figure could represent diseases as red nodes in one side (red circles), targets as white nodes in the other side with lines connecting known associations between them. Between both lines, yellow circles represent new hypotheses (genes or diseases) inferred by different types of biological evidence. The size of the nodes may represent the confidence of the new hypothesis based on the cumulative evidence.

    Challenges

    The main challenges for this project are:

    • Creating these new hypotheses based on the current data stored in the CTTV portal is challenging. Creating these type of inferred associations is in our current roadmap. Depending on his/her skills and interests the students will have the opportunity to work in this aspect of the project too.
    • The number of inferred associations to display can be very variable and poses a challenge in the visualisation. Aggregation techniques, filtering options etc may be required.
    • If a network visualisation is chosen, computing a sensible layout is critical.

    Involved toolkits or projects

    • Depending on the type of visualisation selected this project may need the use of current visualisation tools like cytoscape
    • D3 may be required to develop the visualisation
    • The CTTV web application uses AngularJS and Twitter Bootstrap. Current visualisations have been written in Javascript using D3 as its unique dependency.

    Required skills

    • Solid understanding of modern Javascript.
    • Working knowledge or interest in data visualisation is required.
    • Familiarity with current BioJS would be beneficial.

    CTTV is a highly collaborative project, having the ability to work in an Agile team interfacing with back-end developers, UX designers and other web developers is essential.

    Mentors

    Miguel Pignatelli (CTTV-EBI), Luca Fumis (CTTV-EBI)

  • schoolWeb Components integration in BioJS

    Rationale

    The new Web Components standards are a perfect fit for BioJS. They solve many of the major challenges that BioJS components face in a natural way and what is more important, in a native way. Both projects share the same design principles and philosophy.

    The more exciting features that Web Components offers to the development of re-usable visual components in general and BioJS in particular are:

    • Interoperability: Sharing components independent of the UI framework of choice.
    • Declarative interface of the components: Custom elements allow a clean and easy way of using your component.
    • Shadow DOM allows true encapsulation of the BioJS components, one of the most critical features in the development of visual components. In combination with HTML templates gives a powerful tool to create them.
    • Unified and homogenic way to create custom elements, templates, component communication (via Polymer) and defining a component API.

    Approach

    In a first phase the student would need to explore what is the best strategy to introduce web components in BioJS. There are several aspects to have in mind while during this phase:

    • BioJS already has a significant amount of components registered (130 at the time of writing; Fig. 1) and backwards compatibility is a concern. Ideally BioJS should support components written both using web components standards and not.
    • The Web Components specifications is still a moving target. Significant parts of their APIs haven’t been agreed yet between different browser vendors. This imposes an overhead in maintaining any adopted solution.
    • Polymer is the most matured library built on top of Web Components to date (released version 1 in 2015) and the only one that can be considered production ready. It also provides a set of polyfills that leverages the use of Web Components in current browsers. As an extra benefit it offers an abstraction layer over web components standards making it an easier to target than the unstable specifications.

    Once defined the best strategy we will need to implement the needed changes in the current registry to better support web components.

    One of the main features of BioJS is the definition of gold standards in component development. Documenting how to use Web Components in the context of BioJS and helping to define these gold standards are also an important part of the project.

    Fig. 1: Screenshot of the BioJS registry

    Challenges

    • Web components specifications are still a “moving target”. Any development using their APIs will need active and frequent maintenance over time. This would be alleviated if Polymer is used.
    • Any new technology adopted by BioJS will need the adoption of the developers contributing to BioJS with new components. Facilitating this adoption by writing documentation, participating in workshops and helping component developers will be also required.
    • Supporting live examples in the registry may also be a challenge

    Involved toolkits or projects

    Polymer may be a requirement for the project.

    Needed skills

    • Solid programming skills in JavaScript is required
    • Experience or interest in learning Web Components principles is also desired
    • Working experience with Polymer is a plus.

    Mentors

    Miguel Pignatelli (CTTV-EBI) and Leyla García (Uniprot-EBI)

  • schoolRDF schema and path finder

    Rationale

    Biological entities, such as genes and proteins, are inherently connected and can form complex networks when aggregating data from multiple sources. Data providers are increasingly moving towards the Resource Description Framework (RDF) and a Linked Data approach for publishing data. RDF provides a mechanism to describe schemas that can capture the nature of the relations that hold between entities. Effectively documenting an RDF schema is crucial for users wanting to explore or query a linked dataset, however, adequate tooling to support both the visualization and exploration of datasets is currently lacking.

    This projects has two phases. During the Google Summer of Code part, we aim to finish phase 1 “Schema Visualization” and advance as much as possible to phase 2 “Path Finder”

    Schema Visualization Different approaches exist for visualizing RDF (see this overview). However, not to much has been done regarding visualization of schemas. This is the first problem we want to address.

    Path Finder Knowing a model schema is a good starting point, but more it is not enough when multiple datasets are involved. For traversing multiple datasets, direct links or short paths become convenient, they make it easier searching and data integration. RelFinder is useful to find paths between two entities in the same dataset but it does not work yet across multiple datasets. A more complex issue arises when only the starting point is known; for instance, from a chemical, what kind of entities in protein or gene datasets can I reach? That is the second problem we want to address.

    Approach

    Schema Visualization Initially, we need to extract the high level entities used in an RDF dataset. For instance, multiple Gene Ontology (GO) terms migh be in used, but at the end all of them belong to the same broad type “GO term” thus in a schema we only want to see a link to “GO terms”. Once the schema parser has been obtained, then we can move to the visualization part.

    For instance, a Gene Expression Atlas schema looks like:

    Fig. 1: Gene Expression Atlas RDF scheme

    Path finder We need to create direct links or short paths across multiple datasets in an automtic/semi-automatic way. You can find more info here.

    Challenges

    Schema Visualization
    • Handling large dataset in RDF schema
    • Finding a common way to extract RDF schemas from multiple heterogeneous datasets
    • Avoiding collision and overlapping in visualization
    Path finder
    • Handling large dataset in RDF schema
    • Creating direct links or short paths across multiple datasets

    Involved toolkits or projects

    • Projects: BioJS, EBI-RDF, UniProt-RDF
    • Toolkits: Gulp, browserify, jQuey, D3, mustache and any other JS library that the student might find convenient to address the problem.

    Needed skills

    • Good understanding of JavaScrip development
    • Basic understanding of RDF and Linked Data
    • Some understanding of bioinformatics data might be an advantage but is not required

    Mentors

    Leyla García (UniProt, EMBL-EBI) and Simon Jupp (Samples, Phenotypes and Ontologies, EMBL-EBI)

  • schoolIPython

    Rationale

    Python and IPython (a notebook software for “interactive computing”) are very popular among Bioinformaticians (e.g. BioPython) and we would like to make our visualizations components easily accessible for IPython users. One example is that instead of showing raw sequences in JSON or a pre-block, we could use an interactive multiple sequence alignment viewer. Our long term goal is to have a BioJS specification that allows to wrap all BioJS components into IPython.

    Fig. 1: Example of how one could create an MSA visulization in IPython

    Approach

    In IPEP 23 IPython describes an API that enables the user to generate and manipulate the GUI of the IPython notebook viawidgets. As a start to familiarize yourself with IPEP23 you should port one or two components to IPython widgets. The most interesting question of this projects is how we infer knowledge about data types into BioJS components - a related project (biojs2galaxy) uses special template files for this. Moreover as every IPython Widget has to inherit from a special Backbone.View, we could also put some work into a BioJS core.

    This project should also improve the widget management in IPython/Jupyter and therefore you will be working with the amazing people from the IPython community to build an extension manager for Jupyter widgets, so that it is super easy for IPython users to install and depend on custom widgets. This is a unique opportunity, because a good design of this extension manager could enhance the user experience drastically. Another amazing outcome of the work on the extension manager for Jupyter is that widget dependencies could also be resolved for static notebooks.

    Fig. 2: Interactive IPython widgets - Angry Birds notebook from the CS110 course by Doug Blank

    Challenges

    • Two-way synchronization (changes in the frontend should update the backend)
    • IPython widgets are currently only shown in running notebooks - static (exported) notebooks are quite common
    • IPython is growing - they recently rebranded to Jupyter to show that they are agnostic to the actual kernel. So, supporting other languages like R, Ruby or Scala would be an upcoming problem.
    • Resizable widgets

    Resources

    • IPEP23
    • Discussion about the BioJS - iPython integration on GitHub

    Mentors

    Tim Ruffles (for BioJS) and Doug Blank (for IPython/Jupyter)

  • schoolGeneric protein expression view for Human Body

    Rationale

    neXtProt is a fast growing resource integrating human protein data from different databases. As far as expression is concerned, neXtProt integrates data from Bgee (at mRNA level) and the Human Protein Atlas (at protein level). This data must be visualized in a coherent and scientific manner for biologists.

    Approach

    • Understand the data contained in neXtProt related to human body expression.
    • Understand human body different part and tissues (nervous, cardiovascular, ... systems; Fig. 1).
    • Elaborate state-of-the-art document with current viewers and their limitations.
    • Propose and implement a viewer that satisfy the need of biologists using neXtProt data.

    Fig 1. Wikimedia's static image of Human Body with tissues

    Challenges

    • The amount and granularity of the data to be represented in a coherent and user-friendly manner.
    • The technology involved and generic aspect of the viewer.

    Involved toolkits or projects

    • JSON REST API
    • D3
    • NodeJS
    • AngularJS,

    Needed skills

    Interest for biology and new technologies, essentially JavaScript.

    Mentors

    Daniel Teixeira and Pierre-André Michel (both at CALIPHO SIB in Geneva)


Who are the people behind BioJS?

We are an international team developing beautiful, interactive and easy-to-share JavaScript applications to visualize biological data on the web. We are looking for ambitious and motivated students who wish to spend their summer collaborating with us on the projects described below.

Our mentors

Leyla Garciafavorite
Leyla Garciaclose

Leyla is a software engineer at EMBL-EBI and organizes events around the BioJS project. She has been mentoring a multitude of students in programming throughout her academic career.
Project to mentor: Web Components Integration in BioJS
Contact: [email protected]

Miguel Pignatellifavorite
Miguel Pignatelliclose

Miguel Pignatelli is a Javascript developer at the Centre for Therapeutic Target Validation (CTTV) and has contributed to BioJS with several projects.
Projects to mentor: Web Components Integration in BioJS, Visualisation of target-disease relationships in drug discovery
Contact: [email protected]

Simon Juppfavorite
Simon Juppclose

Simon has degrees in both Biochemistry and Computer Science and now work somewhere between both disciplines. His research interests are generally related to information management, but more specifically, using computers to improve the way we capture, manage and share knowledge about biology. Most of his work is now focused on the use of ontologies to capture knowledge in some computationally useful form.
Project to mentor: RDF Schema and Path Finder
Contact: [email protected]

Luca Fumisfavorite
Luca Fumisclose

Luca Fumis is a web developer at the Centre for Therapeutic Target Validation (CTTV) and a contributor to several BioJS projects.
Project to mentor: Visualisation of target-disease relationships in drug discovery
Contact: [email protected]

Tim Rufflesfavorite
Tim Rufflesclose

Tim is founder of SidekickJS. JavaScript trainer and instructor for Angular, D3, Node and Backbone. Maintainer of the beautiful JavaScript Garden
Project to mentor: iPython
Contact: [email protected]

Doug Blankfavorite
Doug Blankclose

Doug Blank is an Associate professor of Computer Science at Bryn Mawr College, PA, USA. He has used IPython in his unversity courses for quite a few years and is an active contributor to IPython / Jupyter
Project to mentor: iPython
Contact: [email protected]

Pierre-Andre Michelfavorite
Pierre-Andre Michelclose

Pierre-Andre is a bioinformatician and team leader of the neXtProt working group at the Swiss Insitute of Bioinformatics
Project to mentor: Generic protein expression view for Human Body
Contact: [email protected]

Daniel Teixeirafavorite
Daniel Teixeiraclose

Daniel is a bioinformatician and developer at the Swiss Insitute of Bioinformatics
Project to mentor: Generic protein expression view for Human Body
Contact: [email protected]

Martin Cookfavorite
Martin Cookclose

Martin is an experienced web developer working at the ELIXIR Hub.
Project to mentor: Visualisation of standard types in life science with BioSchemas, schema.org and BioJS
Contact: [email protected]

Rafael Jimenezfavorite
Rafael Jimenezclose

Rafael is Chief Technical Officer at the ELIXIR Hub.
Project to mentor: Visualisation of standard types in life science with BioSchemas, schema.org and BioJS
Contact: [email protected]

Guy Yachdavfavorite
Guy Yachdavclose

Guy is co-founder and CEO of BoSoF, a spin-off of the Rostlab at Technical University Munich.
Project to mentor: Bonestagram - DICOM visualization made fun
Contact: [email protected]

Christian Dallagofavorite
Christian Dallagoclose

Christian is a master's student and web developer at the Rostlab at Technical University Munich.
Project to mentor: Bonestagram - DICOM visualization made fun
Contact: [email protected]

How to apply for a project

Please stick to our questions and guidelines described in this section to apply for a project at BioJS. If you want to read more about writing high-quality proposals, feel also free to check on the GSoC Student's Manual.

star_borderStart early

Experience from other organisations has shown that the students often make a mistake converting the official 19:00 UTC deadline to their own timezone or have technical problems submitting during the final few seconds. We very much want to receive your application and don’t want you to be disappointed or miss your chance of a great summer! So please remember to submit early and then edit as often as you want before the deadline (25.03, 19:00 UTC).

star_borderTell us about yourself

As we don’t know you, we would like to know more about your personality and what your passion is. Please let us know about the following:

  • How can we contact you?
  • How are your programming skills?
  • Which projects have you worked on so far?
  • Do you have any code samples? We are happy to check out code samples sent via mail or any github or other open source repository which you have contributed to
  • Are you familiar with JavaScript/jQuery?
  • Is there anybody who can vouch for your experience?

We would also like to know about your academic experience so far:

  • At wich university/college are you currently enrolled?
  • At what year/semester are you? What is your major/degree/focus?
  • Are you part of a research group?
  • Have you had any lectures about Bioinformatics or Biochemistry?
  • Are you familiar with JavaScript/jQuery?
  • Have you contributed to any Open Source project before

star_borderExplain Your Goals

  • What exactly do you intend to do? Please be as specific as possible
  • How do you plan to achieve it? What do you think might be tricky and which technologies would you like to use?
  • What are your milestones for your project? We recommend to setup a timeline with weakly milestones you think are suitable

star_borderCommunication

  • What is your ideal approach to keeping everybody informed of your progress, problems, and questions over the course of the project?
  • What should be our strategy if you vanish during the project? How do you plan to plan to keep yourself on track?

star_borderWhy BioJS?

  • Tell us what makes you excited to work on one of our BioJS projects!
  • How do you envision your involvement with BioJS after the GSoC?

star_borderSome tips

  • Is there any special reason why we should pick you? Name it.
  • Get in touch with us as soon as possible and post your ideas on our slack chat, mailing list, gitter or IRC channel #biojs on freenode.

Registrations Closed!

You can still contribute to BioJS outside of GSoC. Ask us on Slack Chat to get started.

Our organisation admins

Jessica Jordan Profile Image
Jessica Jordan
Org Adminfavorite
Jessica Jordanclose

Jessica is a web developer in the BioJS project, working at The Genome Analysis Centre (TGAC), Norwich UK
Contact: [email protected]

Leyla Garcia Profile Image
Leyla Garcia
Org Adminfavorite
Leyla Garciaclose

Leyla is a software engineer at EMBL-EBI and organizes events around the BioJS project. She has been mentoring a multitude of students in programming throughout her academic career.
Project to mentor: Web Components Integration in BioJS
Contact: [email protected]

Tatyana Goldberg Profile Image
Tatyana Goldberg
Org Adminfavorite
Tatyana Goldbergclose

Tatyana is a PhD student in Bioinformatics at TU Munich. She is both a developer and organizer of workshops and former Google Summer of Code projects in BioJS.
Contact: [email protected]

Manuel Corpas Profile Image
Manuel Corpas
Principal Investigatorfavorite
Manuel Corpas (TGAC)close

Manuel is the Plant and Animal Genomes Project Leader at the bioinformatics driven Genome Analysis Centre in Norwich, UK and one of the main driving forces for the BioJS project since several years.

Our three organisation admins are looking forward to support both mentors as well as students in getting the most out of their Google Summer of Code.
Contact us via e-mail or ask us on slack!

whatshot

Stay up-to-date with GSoC in BioJS