Author : Ashley G Truter Updated: 22-Jun-2016 Purpose: This document describes the munge function.
The munge script patches the remove-each
function in R3 to return the modified series.
USAGE: MUNGE data spec /sheet name /update action /delete /part columns /where condition /compact /only /group having /order /save file /list DESCRIPTION: Load and/or manipulate a block of tabular (column and row) values. MUNGE is a function value. ARGUMENTS: data -- REBOL block, CSV or Excel file (block! file! binary! vector!) spec -- Size of each record or block of heading words (none! gets cols? file) (integer! block! none!) REFINEMENTS: /sheet -- Excel worksheet (default is 1) name (string! word! integer!) /update -- Offset/value pairs (returns original block) action (block!) /delete -- Delete matching rows (returns original block) /part -- Offset position(s) to retrieve columns (block! integer! word!) /where -- Expression that can reference columns as c1, c2, etc condition (block!) /compact -- Remove blank rows /only -- Remove duplicate rows /group -- One of count, max, min or sum having -- Word or expression that can reference the initial result set column as count, max, etc (word! block!) /order -- Sort result set /save -- Write result to a delimited file file -- csv or txt (file!) /list -- Return new-line records
Reading from Excel uses Powershell and is dependant upon the relevant software being installed. For further options when loading data see the load-excel
function.
Must be a block or delimited file.
>> munge [a 1 b 2] 2 == [a 1 b 2] >> munge %test.csv 2 == [a 1 b 2]
An integer specifying the number of columns or a block of column name words. Specifying none
for a file will use the cols?
function to determine the number of columns.
>> munge [a 1 b 2] 2 >> munge [a 1 b 2] [A B] >> munge %test.xlsx none
If an integer is used then column names will default to c1, c2, etc up to the number of columns specified. Column names can be referenced in the /where condition
, /update action
and /group having
blocks.
Refinements are evaluated in the order presented so, for example, a /where/group
will exclude rows prior to grouping.
Expects a sheet number or name. Default is 1.
>> munge/sheet %test.xlsx 2 2 == [a b 1 2] >> munge/sheet %test.xlsx 2 "Sheet 2" == [a b 1 2]
This refinement alters the contents of the original block and returns once completed.
>> munge/update [a 1 b 2] 2 [2 0] == [a 0 b 0] >> munge/update [a 1 b 2] 2 [2 [c2 * 2]] == [a 2 b 4] >> munge/update/where [a 1 b 2] 2 [2 0] [c1 = 'b] == [a 1 b 0]
Note that:
>> munge/update ["1"] 1 [1 integer!] == [1]
is shorthand for:
>> munge/update ["1"] 1 [1 [to integer! c1]] == [1]
This refinement alters the contents of the original block and returns once completed.
>> munge/delete [a 1 b 2] 2 == [] >> munge/delete/where [a 1 b 2] 2 [c1 = 'b] == [a 1]
Specifies the column(s) to extract via one or more integers and/or column name words.
>> munge/part [a 1 b 2] 2 1 == [a b] >> munge/part [a 1 b 2] 2 'c1 == [a b] >> munge/part [a 1 b 2] [A B] 'A == [a b] >> munge/part [a 1 b 2] 2 [2 1] == [1 a 2 b] >> munge/part [a 1 b 2] 2 [c2 c1] == [1 a 2 b] >> munge/part [a 1 b 2] [A B] [B A] == [1 a 2 b]
Expects a block that can include references to column names.
>> munge/where [a 1 b 2] 2 [odd? c2] == [a 1] >> munge/where [a 1 b 2] [A B] [B = 2] == [b 2] >> munge/where [a 1 b 2] 2 [c2 > 1 c2 < 3] == [b 2]
This last condition takes advantage of the fact that munge wraps the condition within an all []
block.
This refinement removes blank rows and alters the contents of the original block.
>> munge/compact ["" "" a 1 b 2] 2 == [a 1 b 2]
This refinement removes duplicate rows and alters the contents of the original block.
>> munge/only [a 1 b 2 a 1] 2 == [a 1 b 2]
Each of the following operations "groups" the data according to the specified having
expression.
>> munge/group [a b a b c d] 1 'count == [a 2 b 2 c 1 d 1] >> munge/group [a b a b c d] 2 'count == [a b 2 c d 1] >> munge/group [a b a b c d] 2 [count > 1] == [a b 2]
This refinement sorts the original block if neither /part or /where are specified.
>> munge/group [1 2 3 4 5 6] 1 'avg == 3.5 >> munge/group [a 1 a 2 c 3 c 4] 2 'avg == [a 1.5 c 3.5] >> munge/group [a 1 a 2 c 3 c 4] 2 [avg > 2] == [c 3.5]
This refinement sorts the original block if neither /part or /where are specified.
>> munge/group [1 2 3 4 5 6] 1 'max == [6] >> munge/group [a 1 a 2 c 3 c 4] 2 'max == [a 2 c 4] >> munge/group [a 1 a 2 c 3 c 4] 2 [max = 4] == [c 4]
This refinement sorts the original block if neither /part or /where are specified.
>> munge/group [1 2 3 4 5 6] 1 'min == [1] >> munge/group [a 1 a 2 c 3 c 4] 2 'min == [a 1 c 3] >> munge/group [a 1 a 2 c 3 c 4] 2 [min = 1] == [a 1]
This refinement sorts the original block if neither /part or /where are specified.
>> munge/group [1 2 3 4 5 6] 1 'sum == [21] >> munge/group [a 1 a 2 c 3 c 4] 2 'sum == [a 3 c 7] >> munge/group [a 1 a 2 c 3 c 4] 2 [sum > 5] == [c 7]
This refinement sorts the original block if neither /part or /where are specified.
>> munge/order [b 1 a 2] 2 == [a 2 b 1]
>> munge/save [a 1 b 2 c 3] 2 %test.csv ; CSV delimited file >> munge/save [a 1 b 2 c 3] 2 %test.txt ; tab delimited file
>> munge/list [a 1 b 2 c 3] 2 == [ a 1 b 2 c 3 ]