# Munge A set of functions that enable you to get blocks of tabular (row and column) data from a variety of sources, manipulate the data in various ways, then store the result as a Rebol/Red block or even save back as a CSV, tab-delimited or Excel file! ## Getting started Download [munge3.r](http://dobeash.com/files/munge3.r), then from a Rebol or Red console: <pre> >> do %munge3.r >> ctx-munge/list [1 2 3] == [ 1 2 3 ] </pre> Alternatively, you can selectively export functions to the global context: <pre> >> ctx-munge/export [list munge] </pre> Or export all functions with: <pre> >> ctx-munge/export bind words-of ctx-munge 'self </pre> ## Data Structure Rows are represented as blocks of data within a "table" block. <pre> >> load-dsv "a,1^/b,2" == [["a" "1"] ["b" "2"]] </pre> ## Reading files ### Delimiter Seperated Values (DSV) <pre> >> load-dsv %file.csv >> load-dsv/part %file.csv 1 >> load-dsv/part %file.csv [1 2] >> load-dsv/where %file.csv [row/1 = 1] </pre> ### Excel <pre> >> load-excel %file.xlsx 'Sheet1 >> load-excel/part %file.xlsx 'Sheet1 1 >> load-excel/part %file.xlsx 'Sheet1 [1 2] >> load-excel/where %file.xlsx 'Sheet1 "F1 = 1" </pre> ### PDF <pre> >> read-pdf %file.pdf >> read-pdf/lines %file.pdf </pre> ## Writing files ### DSV <pre> >> write-dsv %file.csv block >> write-dsv %file.txt block </pre> ### Excel <pre> >> write-excel %file.xlsx ["Sheet1" [[a b][1 2]] [5 10]] >> write-excel/filter %file.xlsx ["Sheet1" [[a b][1 2]] [5 10]] </pre> ## Quering databases ### SQL Server <pre> >> sqlcmd sn db "SELECT * FROM TABLE" >> sqlcmd/headings sn db "SELECT * FROM TABLE" </pre> ### SQLite <pre> >> sqlite %file.db "SELECT * FROM TABLE" >> sqlite/headings %file.db "SELECT * FROM TABLE" </pre> ## Munging blocks The main function, `munge`, is typically used to manipulate blocks of data retrieved from files and databases. ### /where The `/where` refinement lets you specify a block of Rebol conditions (wrapped within an `all` block) that can reference row values as `row/1`, `row/2`, etc). <pre> >> munge/where data [row/1 = 1] >> munge/delete/where data [row/1 > 1 row/1 < 10] >> munge/update/where data [1 row/2] [any [row/1 < 5 row/1 > 10]] </pre> Alternatively, if the data is sorted a key value may be specified. The key value does not need to be unique, but it must be the first value of each row retrieved. <pre> >> sort data >> munge/where data 1 >> munge/delete/where data 1 >> munge/update/where data [1 row/2] 1 </pre> This initiates a [binary search](https://en.wikipedia.org/wiki/Binary_search_algorithm) that can be thousands of times faster than a block predicate. # ctx-munge ##append-column <pre> USAGE: append-column block value /header heading DESCRIPTION: Append a column of values to a block. append-column is of type: function! ARGUMENTS: block [block!] value REFINEMENTS: /header heading </pre> ##ascii-file? <pre> USAGE: ascii-file? file DESCRIPTION: Returns TRUE if file is ASCII. ascii-file? is of type: function! ARGUMENTS: file [file!] </pre> ##average-of <pre> USAGE: average-of block DESCRIPTION: Average of values in a block. average-of is of type: function! ARGUMENTS: block [block!] </pre> ##call-oledb <pre> USAGE: call-oledb file cmd /hdr DESCRIPTION: Call OLEDB via PowerShell returning STDOUT. call-oledb is of type: function! ARGUMENTS: file [file!] cmd [string!] REFINEMENTS: /hdr => First row contains columnnames not data. </pre> ##call-out <pre> USAGE: call-out cmd DESCRIPTION: Call OS command returning STDOUT. call-out is of type: function! ARGUMENTS: cmd [string!] </pre> ##check <pre> USAGE: check data DESCRIPTION: Verify data structure. check is of type: function! ARGUMENTS: data [block!] </pre> ##cols? <pre> USAGE: cols? data /with delimiter /sheet name DESCRIPTION: Number of columns in a delimited file or string. cols? is of type: function! ARGUMENTS: data [file! binary! string!] REFINEMENTS: /with delimiter [char!] /sheet => Excel worksheet name (default is "Sheet 1"). name [string! word!] </pre> ##delimiter? <pre> USAGE: delimiter? data DESCRIPTION: Probable delimiter, with priority given to tab, bar, tilde, semi-colon then comma. delimiter? is of type: function! ARGUMENTS: data [file! string!] </pre> ##digit <pre>digit is a bitset! of value: make bitset! #{000000000000FFC0} </pre> ##digits? <pre> USAGE: digits? data DESCRIPTION: Returns TRUE if data not empty and only contains digits. digits? is of type: function! ARGUMENTS: data [string! binary!] </pre> ##distinct <pre> USAGE: distinct data DESCRIPTION: Remove duplicate and empty rows. distinct is of type: function! ARGUMENTS: data [block!] </pre> ##enblock <pre> USAGE: enblock data cols DESCRIPTION: Convert a block of values to a block of row blocks. enblock is of type: function! ARGUMENTS: data [block!] cols [integer!] </pre> ##export <pre> USAGE: export words DESCRIPTION: Export words to global context. export is of type: function! ARGUMENTS: words [block!] => Words to export. </pre> ##fields? <pre> USAGE: fields? data /with delimiter /sheet name DESCRIPTION: Column names in a delimited file or string. fields? is of type: function! ARGUMENTS: data [file! string!] REFINEMENTS: /with delimiter [char!] /sheet => Excel worksheet name (default is "Sheet 1"). name [string! word!] </pre> ##first-line <pre> USAGE: first-line data DESCRIPTION: Returns the first non-empty line of a file. first-line is of type: function! ARGUMENTS: data [file! string!] </pre> ##flatten <pre> USAGE: flatten data DESCRIPTION: Flatten nested block(s). flatten is of type: function! ARGUMENTS: data [block!] </pre> ##latin1-to-utf8 <pre> USAGE: latin1-to-utf8 data DESCRIPTION: Latin1 binary to UTF-8 string conversion. latin1-to-utf8 is of type: function! ARGUMENTS: data [binary!] </pre> ##letter <pre>letter is a bitset! of value: make bitset! #{00000000000000007FFFFFE07FFFFFE0} </pre> ##letters? <pre> USAGE: letters? data DESCRIPTION: Returns TRUE if data only contains letters. letters? is of type: function! ARGUMENTS: data [string! binary!] </pre> ##like <pre> USAGE: like series value DESCRIPTION: Finds a value in a series, expanding * (any characters) and ? (any one character), and returns TRUE if found. like is of type: function! ARGUMENTS: series [any-string!] => Series to search. value [any-string! block!] => Value to find. </pre> ##list <pre> USAGE: list data DESCRIPTION: Sets the new-line marker to end of block. list is of type: function! ARGUMENTS: data [block!] </pre> ##load-dsv <pre> USAGE: load-dsv source /part columns /preserve /ignore /where condition /with delimiter DESCRIPTION: Parses delimiter-separated values into row blocks. load-dsv is of type: function! ARGUMENTS: source [file! binary! string!] REFINEMENTS: /part => Offset position(s) to retrieve. columns [block! integer!] /preserve => keep line breaks and extra spaces. /ignore => Ignore truncated row errors. /where => Expression that can reference columns as f1, f2, etc. condition [block!] /with => Alternate delimiter (default is tab, bar then comma). delimiter [char!] </pre> ##load-excel <pre> USAGE: load-excel file sheet /part columns /where condition /distinct /string DESCRIPTION: Loads an Excel file. load-excel is of type: function! ARGUMENTS: file [file!] sheet [integer! word! string!] REFINEMENTS: /part => Offset position(s) / columns(s) to retrieve. columns [block! integer!] /where => Expression that can reference columns as F1, F2, etc. condition [string!] /distinct /string </pre> ##max-of <pre> USAGE: max-of series DESCRIPTION: Returns the largest value in a series. max-of is of type: function! ARGUMENTS: series [series!] => Series to search. </pre> ##merge <pre> USAGE: merge outer key1 inner key2 columns /default DESCRIPTION: Join outer block to inner block on primary key. merge is of type: function! ARGUMENTS: outer [block!] => Outer block. key1 [integer!] inner [block!] => Inner block to index. key2 [integer!] columns [block!] => Offset position(s) to retrieve in merged block. REFINEMENTS: /default => Use none on inner block misses. </pre> ##min-of <pre> USAGE: min-of series DESCRIPTION: Returns the smallest value in a series. min-of is of type: function! ARGUMENTS: series [series!] => Series to search. </pre> ##mixedcase <pre> USAGE: mixedcase string DESCRIPTION: Converts string of characters to mixedcase. mixedcase is of type: function! ARGUMENTS: string [string!] </pre> ##munge <pre> USAGE: munge data /update action /delete /part columns /where condition /group having DESCRIPTION: Load and/or manipulate a block of tabular (column and row) values. munge is of type: function! ARGUMENTS: data [block!] REFINEMENTS: /update => Offset/value pairs (returns original block). action [block!] /delete => Delete matching rows (returns original block). /part => Offset position(s) to retrieve. columns [block! integer!] /where => Expression that can reference columns as f1, f2, etc. condition /group => One of count, max, min or sum. having [word! block!] => Word or expression that can reference the initial result set column as count, max, etc. </pre> ##oledb-file? <pre> USAGE: oledb-file? file DESCRIPTION: Returns true for an Excel or Access file. oledb-file? is of type: function! ARGUMENTS: file [file!] </pre> ##read-pdf <pre> USAGE: read-pdf file /lines DESCRIPTION: Reads from a PDF file. read-pdf is of type: function! ARGUMENTS: file [file!] REFINEMENTS: /lines => Handles data as lines. </pre> ##read-string <pre> USAGE: read-string data /lines DESCRIPTION: Read string(s) from a text file. read-string is of type: function! ARGUMENTS: data [file! binary!] REFINEMENTS: /lines => Convert to block of strings. </pre> ##remove-column <pre> USAGE: remove-column block index DESCRIPTION: Remove a column of values from a block. remove-column is of type: function! ARGUMENTS: block [block!] index [integer!] </pre> ##replace-deep <pre> USAGE: replace-deep data search new DESCRIPTION: Replaces all occurences of a search value with the new value in a block or nested block. replace-deep is of type: function! ARGUMENTS: data [block!] => Block to replace within (modified). search => Value to be replaced. new </pre> ##rows? <pre> USAGE: rows? data DESCRIPTION: Number of rows in a delimited file or string. rows? is of type: function! ARGUMENTS: data [file! binary! string!] </pre> ##sheets? <pre> USAGE: sheets? file DESCRIPTION: Excel sheet names. sheets? is of type: function! ARGUMENTS: file [file!] </pre> ##split-line <pre> USAGE: split-line line delimiter DESCRIPTION: Splits and returns line of text as a block. split-line is of type: function! ARGUMENTS: line [string!] delimiter [char!] </pre> ##sqlcmd <pre> USAGE: sqlcmd server database statement /key columns /headings /string DESCRIPTION: Execute a SQL Server statement. sqlcmd is of type: function! ARGUMENTS: server [string!] database [string!] statement [string!] REFINEMENTS: /key => Columns to convert to integer. columns [integer! block!] /headings => Keep column headings. /string </pre> ##sqlite <pre> USAGE: sqlite database statement /key columns /headings /string DESCRIPTION: Execute a SQLite statement. sqlite is of type: function! ARGUMENTS: database [file!] statement [string!] REFINEMENTS: /key => Columns to convert to integer. columns [integer! block!] /headings => Keep column headings. /string </pre> ##sum-of <pre> USAGE: sum-of block DESCRIPTION: Sum of values in a block. sum-of is of type: function! ARGUMENTS: block [block!] </pre> ##to-column-alpha <pre> USAGE: to-column-alpha number DESCRIPTION: Convert numeric column reference to an alpha column reference. to-column-alpha is of type: function! ARGUMENTS: number [integer!] => Column number between 1 and 702. </pre> ##to-column-number <pre> USAGE: to-column-number alpha DESCRIPTION: Convert alpha column reference to a numeric column reference. to-column-number is of type: function! ARGUMENTS: alpha [word! string! char!] </pre> ##to-hash <pre> USAGE: to-hash data DESCRIPTION: Convert block! to map!. to-hash is of type: function! ARGUMENTS: data [block!] </pre> ##to-rebol-date <pre> USAGE: to-rebol-date date /mdy /ydm /day DESCRIPTION: Convert a string date to a Rebol date. to-rebol-date is of type: function! ARGUMENTS: date [string!] REFINEMENTS: /mdy => Month/Day/Year format. /ydm => Year/Day/Month format. /day => Day precedes date. </pre> ##to-rebol-time <pre> USAGE: to-rebol-time time DESCRIPTION: Convert a string date/time to a Rebol time. to-rebol-time is of type: function! ARGUMENTS: time [string!] </pre> ##to-string-date <pre> USAGE: to-string-date date /mdy /ydm DESCRIPTION: Convert a string or Rebol date to a YYYY-MM-DD string. to-string-date is of type: function! ARGUMENTS: date [string! date!] REFINEMENTS: /mdy => Month/Day/Year format. /ydm => Year/Day/Month format. </pre> ##to-string-time <pre> USAGE: to-string-time time DESCRIPTION: Convert a string or Rebol time to a HH.MM.SS string. to-string-time is of type: function! ARGUMENTS: time [string! date! time!] </pre> ##write-dsv <pre> USAGE: write-dsv file data DESCRIPTION: Write block(s) of values to a delimited text file. write-dsv is of type: function! ARGUMENTS: file [file!] => csv or tab-delimited text file. data [block!] </pre> ##write-excel <pre> USAGE: write-excel file data /filter DESCRIPTION: Write block(s) of values to an Excel file. write-excel is of type: function! ARGUMENTS: file [file!] data [block!] => Name [string!] Data [block!] Widths [block!] records. REFINEMENTS: /filter => Add auto filter. </pre> # Tests |Test|Result| |-|-| |append-column [[a] [1]] 2|[[a 2] [1 2]]| |append-column/header [[a] [1]] 2 'b|[[a b] [1 2]]| |ascii-file? %test.r|true| |ascii-file? %test.xlsx|false| |average-of []|none| |average-of [1 2]|1| |call-oledb %test.xlsx "echo munge"|"munge"| |call-out {sqlite3.exe a.db ""}|""| |check []|true| |check [[1]]|true| |cols? ""|0| |cols? "a,b"|2| |cols? ","|2| |cols?/with ":" #":"|2| |delimiter? ""|#","| |delimiter? "a"|#","| |delimiter? " "|#" "| |delimiter? "~"|#"~"| |delimiter? ";"|#";"| |digit|make bitset! #{000000000000FFC0}| |digits? ""|none| |digits? "a"|none| |digits? "1"|true| |distinct [["" ""] [a 1]]|[[a 1]]| |distinct [[none none] [a 1]]|[[a 1]]| |distinct [[a] [a]]|[[a]]| |distinct [[a 1] [a 1]]|[[a 1]]| |enblock [1 2 3 4] 1|[[1] [2] [3] [4]]| |enblock [1 2 3 4] 2|[[1 2] [3 4]]| |export []|[]| |export [distinct]|[distinct]| |export [cols? rows?]|[cols? rows?]| |fields? ""|[]| |fields? ","|["" ""]| |fields? "a,b"|["a" "b"]| |fields? "a:b"|["a:b"]| |fields?/with "a:b" #":"|["a" "b"]| |first-line ""|""| |first-line "a,1"|"a,1"| |first-line "a,1"|"a,1"| |first-line "a,1"|"a,1"| |flatten []|[]| |flatten [[a] [b]]|[a b]| |flatten [[a 1] [b 2]]|[a 1 b 2]| |latin1-to-utf8 #{}|""| |latin1-to-utf8 #{00}|""| |latin1-to-utf8 #{C2A0}|" "| |letter|make bitset! #{00000000000000007FFFFFE07FFFFFE0}| |letters? ""|true| |letters? "a"|true| |letters? "1"|false| |like "" ""|none| |like %abc %a|true| |like %abc %*a|true| |like %abc %?a|false| |like %abc %b|false| |like %abc %*b|true| |like %abc %?b|true| |like %abc %a?c|true| |like %abc %a?b|false| |like %abc %a*c|true| |like %abc %a*b|true| |list []|[]| |list [a b]|[a b]| |list [[a] [b]]|[[a] [b]]| |load-dsv ""|[]| |load-dsv ","|[]| |load-dsv {" a b "}|[["a b"]]| |load-dsv/preserve {" a b " }|[[" a b "]]| |load-dsv "a,"|[["a" ""]]| |load-dsv/part "a,b,c" 1|[["a"]]| |load-dsv/part "a,b,c" [1]|[["a"]]| |load-dsv/part "a,b,c" [3 1]|[["c" "a"]]| |load-dsv "a,1^/b,2"|[["a" "1"] ["b" "2"]]| |load-dsv/where "a,1^/b,2" [row/1 = "a"]|[["a" "1"]]| |load-dsv "a:b"|[["a:b"]]| |load-dsv/with "a:b" #":"|[["a" "b"]]| |load-excel %test.xlsx 2|[["a" "1"] ["b" "2"]]| |load-excel %test.xlsx 'S1|[["a" "1"] ["b" "2"]]| |load-excel %test.xlsx "S1"|[["a" "1"] ["b" "2"]]| |load-excel %test.xlsx "S 2"|[["c" "3"] ["d" "4"]]| |load-excel/part %test.xlsx "S1" 1|[["a"] ["b"]]| |load-excel/part %test.xlsx "S1" [2 1]|[["1" "a"] ["2" "b"]]| |load-excel/where %test.xlsx "S1" "F1 = 'a'"|[["a" "1"]]| |max-of []|none| |max-of [1 2]|2| |merge [] 1 [] 1 [1]|[]| |merge [[a 1] [b 2]] 2 [[1 "A"]] 1 [1 4]|[[a "A"]]| |merge/default [[a 1] [b 2]] 2 [[1 "A"]] 1 [1 4]|[[a "A"] [b none]]| |min-of []|none| |min-of [1 2]|1| |mixedcase ""|""| |mixedcase "aa"|"Aa"| |mixedcase "aa bb"|"Aa Bb"| |munge []|[]| |munge/where [[a] [a] [b]] 'a|[[a] [a]]| |munge [[a 1] [b 2]]|[[a 1] [b 2]]| |munge/part [[a 1]] 1|[[a]]| |munge/part [[a 1]] [2 1]|[[1 a]]| |munge/where [[a 1]] [even? row/2]|[]| |munge/where [[a 1]] [odd? row/2]|[[a 1]]| |munge/update [[a 1]] [1 0]|[[0 1]]| |munge/update [[a 1]] [1 row/2]|[[1 1]]| |munge/update [[a 1]] [1 string!]|[["a" 1]]| |munge/update/where [[a 1]] [1 0] 'a|[[0 1]]| |munge/update/where [[a 1]] [1 row/2] 'a|[[1 1]]| |munge/update/where [[a 1]] [1 string!] 'a|[["a" 1]]| |munge/update/where [[a 1]] [1 0] [row/1 = 'a]|[[0 1]]| |munge/update/where [[a 1]] [1 row/2] [row/1 = 'a]|[[1 1]]| |munge/update/where [[a 1]] [1 string!] [row/1 = 'a]|[["a" 1]]| |munge/delete [[a 1] [b 2]]|[]| |munge/delete/where [[a 1] [b 2]] 'a|[[b 2]]| |munge/delete/where [[a 1] [b 2]] [row/1 = 'a]|[[b 2]]| |munge/group [[1] [2]] 'count|[[1 1] [2 1]]| |munge/group [[1] [2]] 'avg|1| |munge/group [[1] [2]] 'sum|3| |munge/group [[1] [2]] 'min|1| |munge/group [[1] [2]] 'max|2| |munge/group [[a 1] [a 2] [b 3]] 'avg|[[a 1] [b 3]]| |munge/group [[a 1] [a 2] [b 3]] 'sum|[[a 3] [b 3]]| |munge/group [[a 1] [a 2] [b 3]] 'min|[[a 1] [b 3]]| |munge/group [[a 1] [a 2] [b 3]] 'max|[[a 2] [b 3]]| |oledb-file? %tmp|none| |oledb-file? %test.r|none| |oledb-file? %test.xls|none| |oledb-file? %test.xlsx|true| |read-pdf %test.pdf|"a b^/1 2"| |read-pdf/lines %test.pdf|["a b" "" "1 2"]| |write-dsv %a.txt [[1] [2]] read-string %a.txt|"1^/2"| |read-string/lines %a.txt|["1" "2"]| |remove-column [[a 1] [b 2]] 1|[[1] [2]]| |remove-column [[a 1] [b 2]] 2|[[a] [b]]| |replace-deep [a [a]] 'a 'b|[b [b]]| |rows? ""|0| |rows? "a"|1| |rows? "a"|2| |sheets? %test.xlsx|["S 2" "S1"]| |sheets? %test.accdb|["Table 2" "Table1"]| |split-line "" #","|[]| |split-line "," #","|["" ""]| |sqlcmd sn db "select ''"|[]| |sqlcmd sn db "select NULL"|[]| |sqlcmd sn db "select 1 where 0 = 1"|[]| |sqlcmd sn db "select NULL,NULL"|[]| |sqlcmd sn db "select 1,NULL"|[["1" ""]]| |sqlcmd sn db "select NULL,1"|[["" "1"]]| |sqlcmd sn db "select 0"|[["0"]]| |sqlcmd sn db "select 0,1"|[["0" "1"]]| |sqlcmd sn db "select 0,''"|[["0" ""]]| |sqlcmd/key sn db "select 0" 1|[[0]]| |sqlcmd/headings sn db "select 1 A"|[["A"] ["1"]]| |sqlite %a.db "select ''"|[]| |sqlite %a.db "select NULL"|[]| |sqlite %a.db "select 1 where 0 = 1"|[]| |sqlite %a.db "select NULL,NULL"|[]| |sqlite %a.db "select 1,NULL"|[["1" ""]]| |sqlite %a.db "select NULL,1"|[["" "1"]]| |sqlite %a.db "select 0"|[["0"]]| |sqlite %a.db "select 0,1"|[["0" "1"]]| |sqlite %a.db "select 0,''"|[["0" ""]]| |sqlite/key %a.db "select 0" 1|[[0]]| |sqlite/headings %a.db "select 1 A"|[["A"] ["1"]]| |sum-of []|none| |sum-of [1 2]|3| |to-column-alpha 1|"A"| |to-column-alpha 27|"AA"| |to-column-number 'A|1| |to-column-number "aa"|27| |to-hash []|#()| |to-hash [a b]|#( a: 0 b: 0 )| |to-hash [[a] [b]]|#( a: 0 b: 0 )| |to-hash [[a 1] [b 2]]|#( a: 0 1 0 b: 0 2 0 )| |to-rebol-time "1:00AM"|1:00:00| |to-rebol-time "1:00PM"|13:00:00| |to-rebol-time "1:00 AM"|1:00:00| |to-rebol-time "1:00 PM"|13:00:00| |to-string-date "01-02-67"|"2067-02-01"| |to-string-date "01-02-68"|"1968-02-01"| |to-string-date "01/02/2015"|"2015-02-01"| |to-string-date/mdy "01/02/2015"|"2015-01-02"| |to-string-date/ydm "15/02/01"|"2015-01-02"| |to-string-date "01-02-2015"|"2015-02-01"| |to-string-date/mdy "01-02-2015"|"2015-01-02"| |to-string-date/ydm "15-02-01"|"2015-01-02"| |to-string-time 1:00:00|"01.00.00"| |to-string-time "1:00PM"|"13.00.00"| |write-dsv %a.txt [] read-string %a.txt|""| |write-dsv %a.txt [[]] read-string %a.txt|""| |write-dsv %a.csv [[a 1] [b 2]] read-string %a.csv|"a,1^/b,2"| |write-dsv %a.txt [[a 1] [b 2]] read-string %a.txt|"a 1^/b 2"| |write-excel %a.xlsx ["A" [[A B] [1 2]] [5 10]]|%a.xlsx| |write-excel/filter %a.xlsx ["A" [[A] [B]] [5]]|%a.xlsx|