Author : Ashley G Truter Updated: 22-Jun-2016 Purpose: This document describes the munge function.

The munge script patches the `remove-each`

function in R3 to return the modified series.

USAGE: MUNGE data spec /sheet name /update action /delete /part columns /where condition /compact /only /group having /order /save file /list DESCRIPTION: Load and/or manipulate a block of tabular (column and row) values. MUNGE is a function value. ARGUMENTS: data -- REBOL block, CSV or Excel file (block! file! binary! vector!) spec -- Size of each record or block of heading words (none! gets cols? file) (integer! block! none!) REFINEMENTS: /sheet -- Excel worksheet (default is 1) name (string! word! integer!) /update -- Offset/value pairs (returns original block) action (block!) /delete -- Delete matching rows (returns original block) /part -- Offset position(s) to retrieve columns (block! integer! word!) /where -- Expression that can reference columns as c1, c2, etc condition (block!) /compact -- Remove blank rows /only -- Remove duplicate rows /group -- One of count, max, min or sum having -- Word or expression that can reference the initial result set column as count, max, etc (word! block!) /order -- Sort result set /save -- Write result to a delimited file file -- csv or txt (file!) /list -- Return new-line records

Reading from Excel uses Powershell and is dependant upon the relevant software being installed. For further options when loading data see the `load-excel`

function.

Must be a block or delimited file.

>> munge [a 1 b 2] 2 == [a 1 b 2] >> munge %test.csv 2 == [a 1 b 2]

An integer specifying the number of columns or a block of column name words. Specifying `none`

for a file will use the `cols?`

function to determine the number of columns.

>> munge [a 1 b 2] 2 >> munge [a 1 b 2] [A B] >> munge %test.xlsx none

If an integer is used then column names will default to c1, c2, etc up to the number of columns specified. Column names can be referenced in the `/where condition`

, `/update action`

and `/group having`

blocks.

Refinements are evaluated in the order presented so, for example, a `/where/group`

will exclude rows prior to grouping.

Expects a sheet number or name. Default is 1.

>> munge/sheet %test.xlsx 2 2 == [a b 1 2] >> munge/sheet %test.xlsx 2 "Sheet 2" == [a b 1 2]

This refinement alters the contents of the original block **and returns once completed**.

>> munge/update [a 1 b 2] 2 [2 0] == [a 0 b 0] >> munge/update [a 1 b 2] 2 [2 [c2 * 2]] == [a 2 b 4] >> munge/update/where [a 1 b 2] 2 [2 0] [c1 = 'b] == [a 1 b 0]

Note that:

>> munge/update ["1"] 1 [1 integer!] == [1]

is shorthand for:

>> munge/update ["1"] 1 [1 [to integer! c1]] == [1]

This refinement alters the contents of the original block **and returns once completed**.

>> munge/delete [a 1 b 2] 2 == [] >> munge/delete/where [a 1 b 2] 2 [c1 = 'b] == [a 1]

Specifies the column(s) to extract via one or more integers and/or column name words.

>> munge/part [a 1 b 2] 2 1 == [a b] >> munge/part [a 1 b 2] 2 'c1 == [a b] >> munge/part [a 1 b 2] [A B] 'A == [a b] >> munge/part [a 1 b 2] 2 [2 1] == [1 a 2 b] >> munge/part [a 1 b 2] 2 [c2 c1] == [1 a 2 b] >> munge/part [a 1 b 2] [A B] [B A] == [1 a 2 b]

Expects a block that can include references to column names.

>> munge/where [a 1 b 2] 2 [odd? c2] == [a 1] >> munge/where [a 1 b 2] [A B] [B = 2] == [b 2] >> munge/where [a 1 b 2] 2 [c2 > 1 c2 < 3] == [b 2]

This last condition takes advantage of the fact that munge wraps the condition within an `all []`

block.

This refinement removes blank rows and alters the contents of the original block.

>> munge/compact ["" "" a 1 b 2] 2 == [a 1 b 2]

This refinement removes duplicate rows and alters the contents of the original block.

>> munge/only [a 1 b 2 a 1] 2 == [a 1 b 2]

Each of the following operations "groups" the data according to the specified `having`

expression.

>> munge/group [a b a b c d] 1 'count == [a 2 b 2 c 1 d 1] >> munge/group [a b a b c d] 2 'count == [a b 2 c d 1] >> munge/group [a b a b c d] 2 [count > 1] == [a b 2]

This refinement sorts the original block **if neither /part or /where are specified**.

>> munge/group [1 2 3 4 5 6] 1 'avg == 3.5 >> munge/group [a 1 a 2 c 3 c 4] 2 'avg == [a 1.5 c 3.5] >> munge/group [a 1 a 2 c 3 c 4] 2 [avg > 2] == [c 3.5]

This refinement sorts the original block **if neither /part or /where are specified**.

>> munge/group [1 2 3 4 5 6] 1 'max == [6] >> munge/group [a 1 a 2 c 3 c 4] 2 'max == [a 2 c 4] >> munge/group [a 1 a 2 c 3 c 4] 2 [max = 4] == [c 4]

This refinement sorts the original block **if neither /part or /where are specified**.

>> munge/group [1 2 3 4 5 6] 1 'min == [1] >> munge/group [a 1 a 2 c 3 c 4] 2 'min == [a 1 c 3] >> munge/group [a 1 a 2 c 3 c 4] 2 [min = 1] == [a 1]

This refinement sorts the original block **if neither /part or /where are specified**.

>> munge/group [1 2 3 4 5 6] 1 'sum == [21] >> munge/group [a 1 a 2 c 3 c 4] 2 'sum == [a 3 c 7] >> munge/group [a 1 a 2 c 3 c 4] 2 [sum > 5] == [c 7]

This refinement sorts the original block **if neither /part or /where are specified**.

>> munge/order [b 1 a 2] 2 == [a 2 b 1]

>> munge/save [a 1 b 2 c 3] 2 %test.csv ; CSV delimited file >> munge/save [a 1 b 2 c 3] 2 %test.txt ; tab delimited file

>> munge/list [a 1 b 2 c 3] 2 == [ a 1 b 2 c 3 ]