-
IoT services
I have been playing a lot with n8n recently, which has opened up many potential applications for possible automations and local LLM usage. One thing that I realized pretty quickly is that many services can be made and configured to live in a Docker container, then be accessed by RESTful API. Only thing is the internal plumbing for coordinating multiple hardware. After playing with Kubernetes for a while, I realized that the solution is much simpler. Caddy will reverse proxy my HTTP requests with all the hardware I have transparently with zero issues. Suddenly, all the LLMs requests and HTTP…
-
doMC vs doParallel
If you are using a mac/linux platform, doMC is a much better package than doParallel because it will not duplicate the objects between the host and the worker nodes. Only additional step is, you need to specify which packages, files, and variables to pass to the worker nodes. eg: foreach(file_path = file_paths, .packages = c(“package1”, “package2”), .export = c(“variable1”, “variable2”)) %dopar% {}
-

Note to self – character to numeric conversion
I always forget how to best convert character columns to numeric within my code easily. This is for my own reference df2 = df1 %>% select(-UnneededCharacterColumns) %>% group_by(characterColumn) %>% mutate_if(is.character,as.numeric) %>% summarize(across(everything(), sum)) %>% as.data.frame(.)
-
Pinning environments for snakemake
Although snakedeploy is a great way to figure out how an environment would compile from a pre-existing yaml file, sometimes I run into the opposite issue: I have a working environment and I want to export it. I keep forgetting that conda has an environment export command: conda env export > conda-env.yml Start here, and then modify it to resemble a snakemake pinned file!
-
Issues with docker build
Been trying to figure out why the changes I have been submitting to github hasn’t been propagating to the docker build. Turns out it was constantly using the cached version of the repo without checking if it is updated or not. There is a very simple method to fix that. Add a random string to the repo before building, which forces all subsequent steps to be done from scratch. Clever. This works because ADD always fetches the result, and then Docker compares if the cache has the same result as what was written previously. Since this is randomly generated, it…
-
Compositional tensors for longitudinal microbiome studies
I came across this recently updated preprint that introduces a new package, TEMPTED, from what I assume are the same group that has introduced Compositional Tensor Factorization (gemelli) in 2021. TEMPTED works similarly to CTF in that it formulates tensors of the longitudinal data that would account for participant IDs, time, and condition. It overcomes some of the major limitations that were addressed in that original manuscript: 1) it doesn’t require the use of robust centered log-ratio (rCLR), 2) it works with continuous variables, and therefore doesn’t require binning of time into categories, which may lead to a loss of…
-

Excel line graphs
Note to self, it is much easier to change the table format in excel to wide from long and use line graphs than try to get excel to understand long table formats!
-

More network analysis reading and thoughts
Automatic Rho cut-off determination Network generation requires 2 filtering steps that are consequential on the downstream analyses: 1) Removing features that are rare, and, 2) choosing a correlation (eg: Rho) value that is high enough for it to be biologically relevant. While manual removal is still required for the first issue in compositionally-aware network methods such as SparCC, it appears that there is a way to automatically determine the second cut-off automatically through Random-Matrix Theory (RMT) methodologies. Unfortunately, the implementation of that step was bundled into a larger online platform (MENA) and there is no code on how the steps…
-
Signed, weighted and directed network topographical measurements
I am working on a project that is requiring me to brush up on my network topography methods. The adjacency lists I have been working on have almost always given me good clustering results when I use a Louvain community analysis for modularity identification. The results have only had trivial differences to the spinglass algorithm which also utilizes negative edges. In the current project, it appears that the negative edges have non-trivial effect on my network structure. So, I have to move to review some centrality methods. Perfect project to update myself. First, a disclaimer: This blog is me giving…
-
First post
This is my first post on my blog where I mostly use to remind myself of some research ideas and comments that I need to keep track of.