========================================
== The Salopian Scientific Collective ==
========================================
A data blog by Daniel Greenwood

Knitting automated reports with R Markdown

R Markdown Automation
R Markdown is a simple markup language based based on Markdown, with added functionality for including R code and its output. This entire blog is written as R Markdown documents. I write the text and code in R Markdown, then: Run all the R code and knit the input and output together into a regular Markdown file. Knit that Markdown file into a static HTML page Run the Hugo program, which in a few fractions of a second turns that folder full of HTML pages (together with a few configuration files) into a fully functioning website with homepage, tags, menus etc. Read more...

Creating an interactive data gating tool with plotly and shiny in R

R Shiny
Interactive data gating allows researchers to visually select and analyze specific subsets of data points from complex data sets. This is particularly valuable in bioinformatics, where we often need to select clusters of points from large data sets - such as identifying a cell phenotype in a mixed population using molecular markers. The shiny package for R (and now also for Python) makes it easy to create interactive web applications to communicate our results. Read more...

Using the ChatGPT Python library to make a language-learning tool

ChatGPT AI Python Large Language Models
I’m learning German. There are so many AI-enabled apps for learning languages in the past few years with a multitude of features, but sometimes I want just one simple thing. German has a very different word order from English, and also a much stricter choice of words compared with English. It matters if you translate to change as wechseln, verwechseln, umstellen, andern, verandern etc. ChatGPT is great at writing fluent simple text in multiple languages, and choosing words that fit the full context of the text. Read more...

Efficiently handle slightly big data with Apache Arrow in R

R Apache Big Data
In systems biology, we often need to work with slightly big data. Not so big to justify setting up a database or using a high-performance cluster, but still a bit too big to comfortably work with in memory. We are talking about files in the 10 to 500 GB range, such as: Omics data like RNAseq or proteomics Single-cell phenotype data from high-content microscopy Large public data repositories, like the Human Cell Atlas The Arrow package for R lets us keep our data set on disk, dynamically loading only the rows and columns needed for our analysis. Read more...

Welcome to my weblog

Welcome to another data blog. These days it can seem like we are swimming in an ocean of AI-generated click-optimised content. I therefore decided to start this good old fashioned blog to share some insights and tips from my work as a systems biologist at the ETH (the federal technical university) in Zürich. Expect: Coding tutorials in R and Python Insights into systems biology and bioinformatics Anything else I think of There are no comments sections, subscriptions, adverts, sponsored links or cookies. Read more...
1 of 1