Euprymna genome portal
 

NEO4J Graph Database

An orthology graph database of chromosomes and genes between 2283 animal species.

To access our database, please click here and on the new page that opens, type neo4j for username, and squid007 for password, and click Connect without making any other changes.

Our database is built with NEO4J graph database management system which has its own query language called Cypher.

You can generate Cypher query commands for a few different types of interesting questions below.

First, replace the available options, then click Generate cypher code.

If you don't make any changes, each query type is filled in by default to show example results.

Next, you can simply select the whole code, and copy and paste it into NEO4J command line and run it.

Or you can click Download cypher code and then drag and drop the cypher file you've downloaded onto the NEO4J browser page.

You can then click Save to favorites and run it from favorites via the Star icon on left, or simply Paste to editor and run it from command line.

You can download the visualization of your orthology searches of interest using the download icon on the right top in various formats.

Note: If your search returns no matches, you will receive a "(no changes, no records)" message.

Note: If your search returns too many nodes, you will receive a warning and only see a subset of the nodes. This limit can be increased via "Initial Node Display" in Neo4J Browser Settings.



Cypher Query Generators for Neo4J

 
Selecting the species pair:

Please replace the TaxIDs of the species in the blanks below with the ones of your choice. For every query type unless noted otherwise, these species will be used.

Species 1 TaxID:
Species 2 TaxID:


 
Query Type 1: Visualizing the orthologous genes between a pair of species.




Query Type 2: Subset the Query Type 1 for genes that are orthologous to OGs from an ALG.
This will show the genes from both species that have orthology connection with the common OGs, and the chromosomes that these genes are from. (See EDF.5 in our manuscript).

ALG:





Query Type 3: Subset the Query Type 1 for an ALG which is orthologous to the chromosome(s) of the Species 1.
Filtering for the Species 1 chromosomes that are orthologous to the selected ALG, doing this will also show only the genes that are from these chromosomes in species 1.
For the Species 2, any gene that shares the orthology to a common OG as Species 1 genes will be shown together with the Species 2 chromosomes these genes are from.

ALG:





Query Type 4: Subset the Query Type 1 for a chromosome of Species 1.
Filtering for the genes that are located on the selected chromosome in Species 1.
For Species 2, this will show the genes that share a common orthologous OG, plus the chromosomes that these genes are from.

Chromosome:





Query Type 5: Subset the Query Type 4 for a pair of chromosomal coordinates.
This allows to investigate the orthology of genes locates in a specific region of a chromosome in Species 1.
For the Species 2, it will show the genes sharing a common orthologous OG and the chromosomes they are located on.
Is the synteny preserved, or did the genes spread out to different chromosomes?

Start Coordinate:
End Coordinate :





 

Extracting data for drawing UMAP


Because the length of the file to be generated is usually too long for the web browser limit, this needs to be done via command line by installing neo4j cypher-shell
Please download the Perl script and the TaxID dictionary and run the Perl script in this way:
perl Generate_Input_for_UMAP.pl LineageTaxIDs.txt 6447 6606 10 > CypherShellScript.sh
6447 is taxid to include, 6606 is taxid to exclude (optional) and 10 is to get 10 random genomes (also, optional)
Then run the script you get this way:
cypher-shell -u neo4j -p squid007 -d neo4j -a neo4j://131.130.65.127:4571 -f CypherShellScript.sh > InputForUMAP.txt
The code is getting the distance between start coordinates of each possible gene pair from the 10 species in OG connected groups and gives the average of the distances for each possible pairing.
The distance file can be used to generate UMAP plots.