arc*_*ryx 8 portability haskell compilation ghc
这个问题可能是也可能不是Haskell特有的,但它涉及到某个编程任务所面临的轻微烦恼.
我在Haskell中编写了一个程序,它主要针对我要解决的问题类型,但包括两个依赖组件:脚本的运行时估计函数,基于某个基准测试运行计算,以及文件名转换功能,它是根据我正在使用的文件的命名方案量身定制的.当然,如果我想使用除基准以外的性能的脚本,或者我发现估计过于保守,我想改变用于估计运行时的函数,同样我希望能够如果我需要使用具有不同命名方案的不同文件,请修改文件名转换功能.
但是,我运行我的脚本的(远程)计算机没有安装GHC或runhaskell,所以我不得不修改,编译和重新上传本地机器的代码,这有点麻烦.我的问题是,是否有一种简单的方法来实现我的代码的某些组件的更改,而无需重新编译,以便在调用时反映更改?
如果我的描述含糊不清,我会道歉,并且包括下面的血腥细节,因为如果细节证明不必要,我不想从一开始就用不必要的细节弄乱我的问题.
我在Haskell中编写这段代码主要是因为这是我最熟悉如何实现这些方法的语言; 虽然我知道其他语言可能更具可移植性,但我并不熟悉其他语言以实现这一点,而无需阅读大量文档并进行多次修订以使其工作.如果用Haskell实现我想要的灵活性是不切实际的,我可以理解,但我宁愿知道Haskell不能接受其他语言的建议.
我正在编写代码以在负载共享集群上运行独立的作业,因此我希望最接近地估计特定作业所需的时间,而不会在拍摄不足的情况下导致作业终止,并且没有过度拍摄,从而降低工作的优先级.我基于工作计划输入的大小来估算我的时间,并且我收集了足够的实际数据以得出大小和时间之间的近似二次关系.
我当前为输入分配时间估计并由此建立作业顺序的方式是通过du
使用Haskell脚本解析输出,执行计算,并将时间结果写入新文件,然后将其读入作业分配脚本的循环.
正在为配对文件运行作业,这些文件共享一个通用名称,直到某一点,我希望保留的最后一个公共元素是's',从那时起,任何一个名称中都没有's'字符.因此,我正在向后遍历这些名字,直到我达到's'为止.我的代码包含在下面.这是自由的评论,可能有所帮助或可能混淆.其中一些对我正在处理的任务非常具体.
-- size2time.hs
-- A Haskell script to convert file sizes into job-times, based on observed job-times for
-- various file sizes
--
--
-- This file may be compiled via the following command:
-- > ghc size2time.hs
--
-- Should any edits be made, ensure that the compiled executable is updated accordingly
--
-- The executable is to be run with the following usage
--
-- > ./size2time inputfile outputfile
--
-- where inputfile is the name of a file whose first column contains the sizes, in MB, of each fq.gz
-- (including both paired-end reads), and whose second column contains the corresponding file names, as
-- generated by
--
-- > du -m $( ls DIR/*.fq.gz ) >inputfile
--
-- where DIR is the directory containing the fq.gz files
--
-- output is the name of a file that will be created by the execution of this script, whose first
-- column will contain the run-time, in minutes, of the corresponding job (the times are based on
-- jobs run on Intel CPUs with 12 cores and 2GB of RAM, and therefore will potentially be
-- inapplicable to jobs run on CPUs of different manufacturers, with different numbers of cores,
-- and/or with different allocated RAM), and whose second column contains the scrubbed names of
-- the jobs to be run. The greater time-value for any given pair is used, with only one member of
-- each pair retained, as the file-names of each member of a pair are identical after scrubbing
--
-- import modules for command line arguments, list operations, map operations
import System.Environment
import Data.List
import qualified Data.Map as Map
main = do
args <- getArgs -- parse command line arguments: inputfile, outputfile, <ignored>
let infile = head args
outfile = head . tail $ args
contents <- readFile infile -- read the inputfile
let sf = lines contents -- split into lines
tf = map size2time sf -- peform size2time mapping
st = map sample tf -- scrub filename
stu = Map.toList . Map.fromListWith (max) $ st -- take only the longer of the two times of the paired reads
tsu = map flip2 stu -- put time first
stsu = sort tsu -- sort by time, ascending
tsustr = map unwords . map (\(x,y) -> [show x, y]) $ stsu -- convert back to string
tsulns = unlines tsustr -- join individual lines
writeFile outfile tsulns -- write to the outputfile
{- given a string, with the size of a file and the name of the file,
- returns a tuple with the estimated job-time and the unmodified name
- of the file.
-
- The size-time conversion is extrapolated from experimental data,
- with only the upper extremes considered in order to prevent timeout,
- rounding in the quadratic term, and a linear-degree time padding added
- to allow for upper extremes. If modifications are to be made to any
- coefficients, it is recommended that only linear and constant terms be increased,
- and decreases should only be made after performing sufficient alignments to collect
- enough (file size)--(actual computation time) pairs to verify that the padding is excessive,
- and to determine coefficients that more closely follow the trend of the actual data, with
- the conditions that no data point must exceed the approximation curve, and that sufficient padding
- must be provided to allow for potential inconsistency in the time required for any given size of alignment.
-}
size2time :: String -> (Int,String)
size2time sfstring = let (size:file:[]) = words sfstring -- parses out size and filename
x = fromIntegral (read size :: Int) -- floating point from numeric string
time = floor $ 0.000025 * x ^ 2 + 0.03 * x + 10 -- apply floored conversion
tfstring = (time,file)
in tfstring
{-
- removes all characters in the file-name after 's', which properly scrubs files of the format
- *--Hs--R?.fq.gz, where the ? is either 1 or 2. For filenames formatted in different ways,
- or for alternative naming of the BAM file to be generated, this function must be modified
- to suit the scenario.
-}
sample :: (a,String) -> (String,a)
sample (x,f) = let s = reverse . dropWhile (/= 's') . reverse $ f
in (s,x)
{-
- Reverses the order of a tuple, e.g. so that a Map may be made with a key to be found in the
- current second position of the tuple.
-}
flip2 :: (a,b) -> (b,a)
flip2 (x,y) = (y,x)
Run Code Online (Sandbox Code Playgroud)
我认为您的问题没有明确的解决方案。
如果远程计算机上没有解释器或编译器,则无法修改该计算机上的 Haskell 源代码,然后将其转换为机器可读的形式。
正如其他人所说,也许您可以实现配置文件或命令行选项,以允许在运行时指定可能被修改的数据。
或者,假设您的远程计算机已gcc
安装,您可以让 GHC 在本地计算机上将 Haskell 代码编译为 C,将其传输到远程计算机,尽力了解它如何翻译您的代码,并对 C 进行更改代码并在远程计算机上重新编译。