FME Memorandum: February 2014

2014-02-25

Change of FME Objects Python API regarding Null

It has been announced, and related documentations have been updated or published.
> What's up with the change to the Python API for nulls in Build 14252?
> How does FME handle null attribute values?
> FME 2014 Null Porting Guide for Plug-ins

I updated these articles in this blog.
> Default Attribute Names of XLSXR in FME 2014
> Null in FME 2014: Handling Null with Python / Tcl
> Efficiency of List Attribute Manipulation with Python

In short, when specified attribute stores <null>, FMEFeature.getAttribute method returns
None in FME 2014 (without SP*),
an empty string in FME 2014 SP1+ (build 14252 or later).

2014-02-23

Wanted ListStatisticsCalculator Transformer

> Community: Calculate statistics from a list
Although the statistics calculation for list elements can be performed with the ListExploder and StatisticsCalculator, it might be more convenient if there were such a new transformer.
-----
# Prototype of the ListStatisticsCalculator with Python Script (PythonCaller)
# _count, _mode, _histogram{} are calculated for all elements,
# others are calculated for numeric elements only.
# Partially different from the regular StatisticsCalculator.
import fmeobjects, math

def calculateListStatistics(feature):
elements = feature.getAttribute('_list{}')
if not isinstance(elements, list):
return

# Collect numeric elements and calculate histogram.
numeric, histogram = [], {}
for e in elements:
try:
e = float(e)
numeric.append(e)
except:
pass
histogram[e] = histogram.setdefault(e, 0) + 1
feature.setAttribute('_count', len(elements))
feature.setAttribute('_numeric_count', len(numeric))

# Create histogram and calculate mode.
v, n = None, 0
for i, k in enumerate(histogram.keys()):
feature.setAttribute('_histogram{%d}.value' % i, k)
feature.setAttribute('_histogram{%d}.count' % i, histogram[k])
if n < histogram[k]:
v, n = k, histogram[k]
if v != None:
feature.setAttribute('_mode', v)

# Calculate statistics of numeric elements.
if 0 < len(numeric):
numeric.sort()
feature.setAttribute('_min', numeric[0])
feature.setAttribute('_max', numeric[-1])
feature.setAttribute('_range', numeric[-1] - numeric[0])

# Calculate median.
i = len(numeric) / 2
median = numeric[i] if len(numeric) % 2 else 0.5 * (numeric[i - 1] + numeric[i])
feature.setAttribute('_median', median)

# Calculate sum and mean.
s = sum(numeric)
m = s / len(numeric)
feature.setAttribute('_sum', s)
feature.setAttribute('_mean', m)

# Calculate standard deviation.
if 1 < len(numeric):
ss = sum([(v - m)**2 for v in numeric])
feature.setAttribute('_stddev', math.sqrt(ss / (len(numeric) - 1)))
-----

=====
2014-09-20: See also the ListStatisticsCalculator in FME Store / Pragmatica.
=====

Create Farey Sequence with Python / Tcl Script

This is an interesting subject > Community: Ford circles not touching
Keywords: Farey sequence, Ford circle

I've shared workspace examples related to the subject (FME User Community members only).
FME 2012 SP4+ Edition
FME 2014 Edition

Here I give Python and Tcl script examples which create Farey sequence as a structured list attribute. Probably these are "more than enough is too much", but contain some useful tips which can be applicable generally.

Python

-----
# Python Script Example (PythonCaller)
# Create Farey sequence as a structured list attribute:
# _farey{}.p, _farey{}.q, _farey{}.v (v = p / q)
# Assume input feature has an integer attribute named "_farey_orders".
import fmeobjects
from operator import itemgetter

def createFareySequence(feature):
s = [(0, 1, 0.0), (1, 1, 1.0)]
for q in range(2, int(feature.getAttribute('_farey_orders')) + 1):
for p in range(1, q):
if not have_common_divisor(p, q):
s.append((p, q, float(p)/q))
for i, (p, q, v) in enumerate(sorted(s, key=itemgetter(2))):
feature.setAttribute('_farey{%d}.p' % i, p)
feature.setAttribute('_farey{%d}.q' % i, q)
feature.setAttribute('_farey{%d}.v' % i, v)

# Helper function
# Return True if m and n have a common divisor other than 1.
# Otherwise return False.
def have_common_divisor(m, n):
nmin, nmax = min(m, n), max(m, n)
if nmin == 1:
return False
elif (m % 2 == 0 and n % 2 == 0) or nmax % nmin == 0:
return True
for d in range(3, nmin / 2 + 1, 2):
if m % d == 0 and n % d == 0:
return True
return False
-----

Tcl

-----
# Tcl Script Example (TclCaller)
# Create Farey sequence as a structured list attribute:
# _farey{}.p, _farey{}.q, _farey{}.v (v = p / q)
# Assume input feature has an integer attribute named "_farey_orders".
proc createFareySequence {} {
set s [list {0 1 0.0} {1 1 1.0}]
for {set q 2} {$q <= [FME_GetAttribute "_farey_orders"]} {incr q} {
for {set p 1} {$p < $q} {incr p} {
if {![have_common_divisor $p $q]} {
lappend s "$p $q [expr double($p) / $q]"
}
}
}
set s [lsort -real -index 2 $s]
for {set i 0} {$i < [llength $s]} {incr i} {
FME_SetAttribute "_farey{$i}.p" [lindex $s $i 0]
FME_SetAttribute "_farey{$i}.q" [lindex $s $i 1]
FME_SetAttribute "_farey{$i}.v" [lindex $s $i 2]
}
}

# Helper procedure
# Return 1 if m and n have a common divisor other than 1.
# Otherwise return 0.
proc have_common_divisor {m n} {
set nmin [expr min($m, $n)]
set nmax [expr max($m, $n)]
if {$nmin == 1} {
return 0
} elseif {(![expr $n % 2] && ![expr $m % 2]) || ![expr $nmax % $nmin]} {
return 1
}
for {set d 3} {$d <= [expr $nmin / 2]} {incr d 2} {
if {![expr $n % $d] && ![expr $m % $d]} {
return 1
}
}
return 0
}
-----

2014-02-15

How to count the amount of times a value is present

Other approaches to this question. It's also a good example of "poor cat" :)
> Community: How to count the amount of times a value is present

1. ListHistogrammer Edition *see the Note below
1) Aggregator
Attributes to Concatenate: UPDATE
Separator Character: ; (semicolon)
2) AttributeSplitter
Attribute to Split: UPDATE
Delimiter or Format String: ; (semicolon)
Trim Whitespace: Both
List Name: _list
Drop Empty Parts: Yes
3) ListHistogrammer
Source List Attribute: _list{}
Histogram List Name: _histogram
4) ListExploder
List Attribute: _histogram{}
-----
Note: Currently there is a bug on the ListHistogrammer in FME 2014. Don't use it until it's fixed.
FME 2013 is OK.
-----

2. PythonCaller Edition
-----
# Python Script Example
import fmeobjects

class UpdateCounter(object):
def __init__(self):
self.counter = {}

def input(self, feature):
update = feature.getAttribute('UPDATE')
if update:
for v in [s.strip() for s in str(update).split(';')]:
self.counter[v] = self.counter.setdefault(v, 0) + 1

def close(self):
for v in self.counter.keys():
feature = fmeobjects.FMEFeature()
feature.setAttribute('value', v)
feature.setAttribute('count', self.counter[v])
self.pyoutput(feature)
-----

2014-02-11

Common Table for SchemaMapper and Dynamic Schema

Sometimes I have to read CSV table which doesn't have field names row and write the records into other format table after defining appropriate field names.

When there is not field names row, the CSV reader generates default attribute names. i.e. "col0, col1, col2 ...".
If there were few columns, it would be easy to rename them with the AttributeRenamer and to set User Attributes of the writer feature type manually. But if there were very many columns, it would be so troublesome. In fact I have encountered a case where there were 200+ columns. Naturally, I would consider using the SchemaMapper and Dynamic Schema option.

In such a case, I create a common table which can be used as Attributes / Feature Type Mapping table (for the SchemaMapper) and also Schema Source table (for Dynamic Schema of the writer feature type). Call it "Common Schema Table" here.

1. Common Schema Table Definition

The common schema table looks like this. Although only 3 attributes are shown here, imagine that there are much more attributes in fact.

DestFeature	SrcAttr	DestAttr	DataType	Order	GeomType	SrcFeature
output	col0	ID	fme_decimal(4,0)	1	fme_no_geom	data
output	col1	Name	fme_varchar(16)	2
output	col2	Address	fme_varchar(16)	3
...	...	...	...	...

"DestFeature" (destination feature type name) and "DestAttr" (destination attribute names) are defined for both SchemaMapper and Dynamic Schema.
"SrcFeature" (source feature type name) and "SrcAttr" (source attribute names) are only for SchemaMapper.
Others are for Dynamic Schema. "Order" is optional.
Format of the table is arbitrary. I prefer to use CSV format.

2. SchemaMapper Settings

Insert a SchemaMapper between reader and writer on the workspace, specify the table to its Dataset and set Actions like this.
-----
Map Attributes: SrcAttr -> DestAttr
Map Feature Types: SrcFeature -> DestFeature
-----
The SchemaMapper will rename SrcAttr (col0, col1, col2 ...) to DestAttr (ID, Name, Address ...), and also change the feature type name "data" to "output".

3. Dynamic Schema Settings

Add a "Schema (From Table)" reader to read the common schema table as a Workspace Resource (Menu: Readers > Add Reader as Resource). Parameter settings are:
-----
Feature type: DestFeature
Attribute name: DestAttr
Attribute data type: DataType
Geometry type: GeomType
Attribute sequence: Order
-----
And then, specify the table to "Schema Sources" of the writer feature type.

To create the table takes time and effort, but the maintenance will be easy after once creating.
The method is an application of Example 3 and Example 4 in this article.
> FMEpedia: Dynamic Schema Examples
=====
2014-12-12: The link has been updated. See here instead.
> Dynamic Workflow Tutorial: Destination Schema is Derived from a Lookup Table
====
2014-12-15: Related article:
> Read Schemas from Database with Schema (Any Format) Reader
=====

How about the Chatter?

Recently the FME Community has been updated.
> Update to the FME Community – Please Share Your Feedback

"Chatter" is new content of the Community, I think it will become very useful tool to exchange ideas, tips, solutions etc. among FME users around the world. Just chatting will be also fun :-)
I hope that many users will post there actively.

=====
2014-09-25: Unfortunately the Chatter has been retired a few months ago. A short life...
=====

2014-02-05

Create Line from Comma Separated Coordinates

(FME 2014 build 14234)

I received a CSV data which contains coordinates of line geometries.
This is a simplified example. All coordinates of a line is described in a CSV row, and number of coordinates is variable between 2 and 10. Each line has two attributes (LineID and NumPoints) other than coordinates.
-----
LineID,NumPoints,X1,Y1,X2,Y2,X3,Y3,X4,Y4,X5,Y5,X6,Y6,X7,Y7,X8,Y8,X9,Y9,X10,Y10
1,4,0,0,1,1,2,1,3,0
2,2,2,1,3,2
3,6,3,0,4,0,4,1,5,1,5,2,6,2
-----

The task is to create line geometries based on the CSV data.
My idea is simple. That is to say, if I could transform Comma separated coordinates into Space separated coordinates, it can be replaced with a line geometry using the GeomatryReplacer (GML Encoding). I tried two ways to create Space separated coordinates.

CSV Reader and List Manipulation
1) Read the data with a CSV reader (uncheck "File Has Field Names"), skip the header row.
2) Replace useless attribute values (LineID, NumPoints) with empty strings (AttributeCreator).
3) Transform all attributes into a list attribute (ListPopulator).
4) Concatenate the list elements with white space as delimiter (ListCocnatenator).

TEXTLINE Reader and String Manipulation
1) Read the data with a TEXTLINE reader, skip the header row.
2) Split a row into useless columns and CSV coordinates part (StringSearcher).
3) Rename matched parts (AttributeRenamer).
4) Replace every comma with white space (StringReplacer).

And then, surround the Space separated coordinates with GML tags so that it can be replaced with a line geometry using the GeometryReplacer. This process is common to two ways above.
=====
2014-02-14: The GML fragment can be created directly on "Geometry Source" of the GeometryReplacer using the Text Editor. If you do so, the StringConcatenator can be removed.
=====

Both of them worked fine. Since the coordinate values had to be written into the destination data as attributes, I finally adopted the first way (CSV Reader and List Manipulation).

If "File Has Field Names" option was unchecked when adding the CSV reader, it would generate default attribute names such as "col0, col1, col2 ...". Therefore, all attributes can be transformed into a list attribute with the ListPopulator. It is the point of the first way.

Well, old Excel reader (FME 2013 or earlier) generated default attribute names which were formatted in "col_*" or "F*" (* is 1-based sequential number), so it was possible to do the same trick after just creating a temporary attribute named "col_0" or "F0". But the current Excel reader (FME 2014) generates "A, B, C..." (same as Excel column names) by default. The trick cannot be applied easily to Excel spread sheet now. > Default Attribute Names of XLSXR in FME 2014

Scripting is also possible.
-----
# Python Example
import fmeobjects
def createLine(feature):
i = 2
while True:
x = feature.getAttribute('col%d' % i)
y = feature.getAttribute('col%d' % (i + 1))
if not x or not y:
break
feature.addCoordinate(float(x), float(y))
i += 2
-----
# Tcl Example
proc createLine {} {
for {set i 2} {1} {set i [expr $i + 2]} {
set x [FME_GetAttribute "col$i"]
set y [FME_GetAttribute "col[expr $i + 1]"]
if {![string is double -strict $x] || ![string is double -strict $y]} {
break
}
FME_Coordinates addCoord $x $y
}
}
-----

2014-02-02

Merge Two Lists so that Elements will be Alternate

(FME 2014 build 14234)

There is a dataset storing sea routes. Each feature has two attributes; "Ports" contains every port name on the route (departure port, 0 or more ports of call, arrival port), "Prefectures" contains prefecture names of the ports. Both of them are comma separated values.

RouteID	Ports	Prefectures
1	Hachinohe,Tomakomai,Kawasaki	Aomori,Hokkaido,Kanagawa
2	Osaka,Naha,Hakata,Naha	Osaka,Okinawa,Fukuoka,Okinawa

Based on the dataset, I need to create an attribute whose format should be:
<port>(<prefecture>)-<port>(<prefecture>) ... -<port>(<prefecture>)

For example, the first route feature (RouteID = 1) finally should have an attribute which stores
"Hachinohe(Aomori)-Tomakomai(Hokkaido)-Kawasaki(Kanagawa)".

First, divide the feature flow into two streams, and create list attribute named "_list{}" in each stream.

1) Transform CSV port names into a list attribute
The following workflow creates a list attribute named "_list{}" which contains these elements.
_list{0} = Hachinohe
_list{1} = <missing>
_list{2} = -Tomakomai
_list{3} = <missing>
_list{4} = -Kawasaki
There are <missing> elements between every two ports; 2nd or later port names are headed by a hyphen. Be aware that _list{1} and _list{3} are missing.

2) Transform CSV prefecture names into a list attribute
The following workflow creates a list attribute named "_list{}" which contains these elements.
_list{0} = <empty>
_list{1} = (Aomori)
_list{2} = <empty>
_list{3} = (Hokkaido)
_list{4} = <empty>
_list{5} = (Kanagawa)
The list contains <empty> elements between every two prefectures; the first element is <empty>; every prefecture name is enclosed by parens.

Merge the two streams with a FeatureMerger, then the list attribute will contain these elements. i.e. prefecture names would be assigned into <missing> elements.
_list{0} = Hachinohe
_list{1} = (Aomori)
_list{2} = -Tomakomai
_list{3} = (Hokkaido)
_list{4} = -Kawasaki
_list{5} = (Kanagawa)

Finally concatenate the list elements with a ListConcatenator, then required attribute would be created.

The point is that the NullAttributeMapper in the first stream removes list elements which contain "[to_missing]", so that the FeatureMerger could assign prefecture names to <missing> elements.

Although Transformers can do that, scripting might be easier...
-----
# Python Script Example
import fmeobjects
def concatenatePorts(feature):
ports = feature.getAttribute('Ports')
prefs = feature.getAttribute('Prefectures')
values = []
for port, pref in zip(ports.split(','), prefs.split(',')):
values.append('%s(%s)' % (port, pref))
feature.setAttribute('_ports', '-'.join(values))
-----
# Tcl Script Example
proc concatenatePorts {} {
set ports [FME_GetAttribute "Ports"]
set prefs [FME_GetAttribute "Prefectures"]
set values {}
foreach port [split $ports {,}] pref [split $prefs {,}] {
lappend values [format "%s(%s)" $port $pref]
}
return [join $values {-}]
}
-----

=====
2014-09-20: See also the ListMerger in FME Store / Pragmatica.
=====