FME Memorandum: November 2014

2014-11-29

Enumerate 2-Permutations of List Elements

リスト要素の2-順列を列挙する

This is a question on how to enumerate every 2-permutation of elements in a list.
この問題は、リスト要素の全ての2-順列を列挙する方法に関するものです。
Community Answers > permute a list, output to csv

Source = [11, 22, 33]
2-Permutations = [[11, 22], [11, 33], [22, 11], [22, 33], [33, 11], [33, 22]]

I suggested these 3 ways.
次の3つの方法を提案しました。
1. ListExploder+AttributeRenamer+FeatureMerger+ListExploder+Tester
2. ListExploder+InlineQuerier (SQL statement)
3. PythonCaller (Python script)

Here, add one more way.
ここではもうひとつ追加します。
4. TclCaller (Tcl script)+ListExploder

-----
# TclCaller Script Example
proc extractPermutation {} {
set values {}
set range {}
for {set i 0} {[FME_AttributeExists "List{$i}"]} {incr i} {
lappend values [FME_GetAttribute "List{$i}"]
lappend range $i
}
set index 0
foreach i $range v1 $values {
foreach j $range v2 $values {
if {$i != $j} {
FME_SetAttribute "_join{$index}.Field1" $v1
FME_SetAttribute "_join{$index}.Field2" $v2
incr index
}
}
}
}
-----

My favorite is the SQL approach, and Tcl is also difficult to throw away. But Python could be the most efficient solution in this case.
SQLアプローチが気に入っているしTclも捨てがたいのですが、この場合はPythonが最も効率的かも知れません。
-----
# PythonCaller Script Example
import fmeobjects
class FeatureProcessor(object):
def input(self, feature):
L = feature.getAttribute('List{}')
if L:
for v1, v2 in [(v1, v2) for i, v1 in enumerate(L) for j, v2 in enumerate(L) if i != j]:
newFeature = fmeobjects.FMEFeature()
# If you need to retain the List{} and other attributes,
# replace the line above with this.
# newFeature = feature.cloneAttributes()
newFeature.setAttribute('Field1', v1)
newFeature.setAttribute('Field2', v2)
self.pyoutput(newFeature)
-----

FME 2014 SP4 build 14433

=====
2014-12-01: Probably this script works a treat for enumerating general K-Permutations (K is an arbitrary number).
おそらくこのスクリプトは、一般のK-順列をうまく列挙します（Kは任意の数）。
-----
# PythonCaller Script Example
# Enumerates K-Permutations of List Elements.
# Results will be stored by a nested list named "_perm{}.item{}".
# Assume that "K" (number of items in a permutation) will be specified
# by a feature attribute called "_k".
class PermutationEnumerator(object):
def input(self, feature):
src = feature.getAttribute('_list{}')
self.k = int(feature.getAttribute('_k'))
if self.k <= len(src):
self.i = 0
for i in range(len(src)):
s = src[:]
t = [s.pop(i)]
self.enumeratePermutation(feature, s, t, 1)
else:
feature.setAttribute('_rejected', 1)
self.pyoutput(feature)

def enumeratePermutation(self, feature, s, t, depth):
if self.k <= depth:
for j in range(self.k):
feature.setAttribute('_perm{%d}.item{%d}' % (self.i, j), t[j])
self.i += 1
else:
for i in range(len(s)):
c, d, = s[:], t[:]
d.append(c.pop(i))
self.enumeratePermutation(feature, c, d, depth + 1)
-----
=====
2014-12-05: See also the ListSubsetEnumerator transformer in the FME Store.
FME Store の ListSubsetEnumerator トランスフォーマーも参照してください。

2014-11-23

Grouping Geometric Features Not Sharing Attribute

属性を共有しないジオメトリのグループ化

In this thread, I suggested a solution which includes a Python script to group consecutively adjoining triangular polygons. The point of the script is to use "set" skilfully.
このスレッドでは、連続して隣接する三角形ポリゴンをグループ化するために、Pythonスクリプトを含む解決策を提案しました。スクリプトのポイントは"set"（集合）をうまく使うことです。
FME Community Answers > SpatialRelator with 3D data

The logic could be applied to some other similar scenarios.
For example, consider a case where you need to group discrete areas based on their closeness, like this image. i.e. make a group for areas which are close within a certain distance each other.
Assume that the features don't have any attributes which can be used to group them.
そのロジックは他の似たようなシナリオにも応用できそうです。
例えば次の画像のように、離散的な領域を近さに基づいてグループ化する必要がある場合、つまり、相互に一定の距離以内にある領域をひとつのグループにするような場合を考えます。
フィーチャーは、それらをグループ化するのに使える属性は持っていないものとします。

Input: Individual area features
入力: 個々の領域フィーチャー

Required Result: Grouped based on their closeness
求められる結果: 近さに基づいてグループ化

This is a possible way to group them (add group ID attribute).
1) Use a Counter to add area ID (sequential number, e.g. "_area_id") to the input features.
2) Add a NeighborFinder; send the all features to the Candidate port, set the "Maximum Distance" and the "Close Candidate List Name" parameter (e.g. "_close").
3) Send features output from the Matched port and the UnmatchedCandidate port to a PythonCaller with this script; expose the group ID attribute ("_group_id" in this script example).
以下はそれらをグループ化する（グループID属性を与える）ための可能な方法です。
1) Counterを使い、領域ID（連番。例えば"_area_id"）を入力フィーチャーに与える。
2) NeighborFinderを追加し、全てのフィーチャーをCandidateポートに送り、"Maximum Distance"パラメーターと"Close Candidate List Name"パラメーター（例えば"_close"）を設定する。
3) MatchedポートとUnmuchedCandidateポートから出力されるフィーチャーを次のスクリプトを設定したPythonCallerに送り、グループID属性（このスクリプトの例では"_group_id"）を公開する。
-----
# Python Script Example
import fmeobjects

class GroupIdSetter(object):
def __init__(self):
self.features = {} # Key: Area ID => Value: Feature Object
self.groups = {} # Key: Area ID => Value: Set of Area IDs in Group

def input(self, feature):
areaId = int(feature.getAttribute('_area_id'))
self.features[areaId] = feature
grp = self.groups.setdefault(areaId, set([areaId]))
neighbors = feature.getAttribute('_close{}._area_id')
if neighbors:
for id in [int(s) for s in neighbors]:
grp |= self.groups.setdefault(id, set([id]))
for id in grp:
self.groups[id] = grp

def close(self):
groupId, finished = 0, set([])
for id, grp in self.groups.items():
if id not in finished:
for areaId in grp:
feature = self.features[areaId]
feature.setAttribute('_group_id', groupId)
self.pyoutput(feature)
groupId += 1
finished |= grp
=====
2014-11-26: The "close" method might be a little more efficient if written like this.
"close"メソッドは次のようにも書いた方がもう少し効率的かも知れない。
-----
def close(self):
groupId = 0
for grp in self.groups.values():
if grp:
while grp:
feature = self.features[grp.pop()]
feature.setAttribute('_group_id', groupId)
self.pyoutput(feature)
groupId += 1
=====

But there is also another approach which does not use any script in this example case..
もっとも、この例の場合にはスクリプトを使わないアプローチもあるのですが。

... yup, found out the solution at last, in which I've used the buffering trick.
... お、やっと見つけました。バッファリングを使ったソリューション。
FME Community Answers > Aggregating polygons near to one another

=====
2014-11-27: I published a custom transformer called "ListCombiner" in the FME Store, which can be applied to the scenario example above. This screenshot illustrates a usage example.
上のシナリオ例に適用できる"ListCombiner"というカスタムトランスフォーマーをFME Storeで公開しました。このスクリーンショットは使用例を示しています。

2014-11-15

Sorting Lines Divided by PointOnLineOverlayer

PointOnLineOverlayerによって分割されたラインの並べ替え

The PointOnLineOverlayer can be used to divide a line at one or more points located onto the line. But the output order of the resultant lines is arbitrary, they will not be sorted along the original line in the direction to the end node from the start node.
The outputting order may not be important in many cases, but there could also be some cases where sorting is required.
PointOnLineOverlayerは、ライン上にあるひとつ以上のポイントによってそのラインを分割するのに使えます。しかし、作成されたラインの出力順は任意であり、元のラインの始点から終点の方向の順になるとは限りません。
多くの場合、出力順は重要ではないかも知れませんが、並べ替えが必要になる場合もあり得ます。

If the original line is always pointing to east (max X) from west (min X) like the image, divided lines can be sorted by the X coordinate of the start node. In general cases, similarly, they can be sorted by measure values of their start nodes.
1) Insert a MeasureGenerator before the PointOnLineOverlayer.
2) Add a MeasureExtractor to extract measure value of the start node for each resulting line.
Type: Individual Vertex by Index
Destination Attribute of Point or Vertex: _point_measure
Index: 0
3) Add a Sorter to sort the lines by "_point_measure" ascending.
Naturally, use the original line ID attribute as the primary sort key if there are multiple input lines.
図のように、元のラインが常に西（X最小）から東（X最大）の方向を向いているならば、分割されたラインは始点のX座標によって並べ替えることができます。同様に、一般的にはそれらの始点のメジャー値によって並べ替えられます。
1) MeasureGeneratorをPointOnLineOverlayerの前に挿入する。
2) MeasureExtractorを追加し、結果として得られた各ライン始点のメジャー値を抽出する。
Type: Individual Vertex by Index
Destination Attribute of Point or Vertex: _point_measure
Index: 0

3) Sorterを追加し、"_point_measure"の昇順によってラインを並べ替える。

入力ラインが複数ある場合は、当然、元のラインのID属性を第一ソートキーとして使用します。

Another approach.
Once transform the divided lines into a path (a line consisting of one or more segment lines) using a LineJoiner, and then split again the path into individual segment lines with the PathSplitter. LineJointer parameter setting is:
Preserve Original Orientation: yes
Input Feature Topology: End noded
Preserve Lines as Path Segments: yes
別のアプローチ。
LineJoinerを使って、一旦、分割されたラインをひとつのパス（ひとつ以上のセグメントラインで構成されるライン）に変換し、PathSplitterによって再度それを個々のセグメントラインに分割します。LineJoinerのパラメーター設定は次のとおり。
Preserve Original Orientation: yes
Input Feature Topology: End noded
Preserve Lines as Path Segments: yes

Notes for the another approach:
1) Since the LineJoiner removes all the attributes except ones that were specified to the "Group By" parameter, you will have to restore the attributes afterward if necessary (FeatureMerger etc.).
-----
I've overlooked a way to retain attributes when using the LineJoiner.
- Specify a list name to the "List Name" parameter of the LineJoiner.
- Add a ListIndexcer to demote attributes of the list element.
- Remove the list by an AttributeRemover if you don't use it any more.
a little bit troublesome...
-----
2) In this case, the PathBuilder cannot be used to transform the divided lines into a path, since the order of segments output from the PointOnLineOverlayer is not preferable.
3) If an input line could be a path, it will have to be transformed into a non-path line beforehand. Because the PointOnLineOverlayer preserves path segment boundaries and the PathSplitter will split the path at segment boundaries including the original ones.
別のアプローチに関する注:

1) LineJoinerは"Group By"パラメーターに設定したもの以外の全ての属性を削除するので、必要に応じて後でそれを復元しなければなりません（FeatureMergerなど）。
-----
LineJoinerを使う際に属性を維持する方法を見落としていました。
- LineJoinerの"List Name"パラメーターにリスト名を指定する。
- ListIndexcerを追加してリストの要素を通常の属性に変換する。
- それ以降リストが必要なければ、AttributeRemoverによって削除する。
ちょっと面倒な。。。
-----

2) この場合、PointOnLineOverlayerから出力されるセグメントの順番が望ましいものではないので、PathBuilderは分割されたラインをパスに変換するのに使用できません。

3) 入力ラインがパスである可能性がある場合は、それは、事前にパスではないラインに変換する必要があります。PointOnLineOverlayerはパスのセグメント境界を維持し、PathSplitterはオリジナルを含むセグメント境界によってパスを分割するからです。

Oops, looks like there isn't a simple way to transform a path to a non-path line.

The PathSplitter+LineJoiner does that, but it requires an extra procedure to restore attributes.

Use Python?

おっと、パスを非パスラインに変換する簡単な方法はないみたいだ。

PathSplitter+LineJoinerはそれをしますが、属性を復元するための余分な手続きを必要とします。

Pythonを使う？

-----

# Script Example for PythonCaller

# Converts Path to Line.

import fmeobjects

def convertToLine(feature):

geom = feature.getGeometry()

if isinstance(geom, fmeobjects.FMEPath):

feature.setGeometry(fmeobjects.FMEGeometryTools().convertToLine(geom))

-----

FME 2014 SP4 build 14433

2014-11-08

Create Trapezoidal Buffered Areas from Line

ラインから台形状のバッファーエリアを作成

Continued from this thread.
このスレッドからの続き。
FME Community Answers > Is it possible to create a " trapezoids" shaped line buffer?

It's an interesting subject.
I tried replacing the procedure except the Dissolver with a concise Python script.
For what it's worth.
興味深い題材です。
Dissolver以外の処理手順を簡潔なPythonスクリプトに置き換えてみました。
何かの役に立つかも。
-----
# Script Example for PythonCaller
# Creates Trapezoid Buffered Areas for Each Line Segment.
# Assumes input feature geometry is a Line.
import fmeobjects, math

class TrapezoidBufferer(object):
def input(self, feature):
rb = float(feature.getAttribute('_beg_width')) * 0.5
re = float(feature.getAttribute('_end_width')) * 0.5
k = (re - rb) / float(feature.performFunction('@Length()'))
coordSys = feature.getCoordSys()
coords = feature.getAllCoordinates()
if feature.getDimension() == fmeobjects.FME_THREE_D:
coords = [(x, y) for (x, y, z) in coords]
x0, y0, measure = coords[0][0], coords[0][1], 0.0
g0 = fmeobjects.FMEEllipse(fmeobjects.FMEPoint(x0, y0), rb, rb, 0.0, 0)
for x1, y1 in coords[1:]:
measure += math.hypot(x1 - x0, y1 - y0)
rm = rb + k * measure
g1 = fmeobjects.FMEEllipse(fmeobjects.FMEPoint(x1, y1), rm, rm, 0.0, 0)
geom = fmeobjects.FMEAggregate()
geom.appendPart(g0)
geom.appendPart(g1)
hull = feature.cloneAttributes()
hull.setCoordSys(coordSys)
hull.setGeometry(geom)
hull.performFunction('@ConvexHull()')
self.pyoutput(hull)
x0, y0, g0 = x1, y1, g1
-----

The script transforms a polyline into multiple polygons like this image. To get final geometry, dissolve them with a Dissolver.
このスクリプトはひとつのポリラインを図のような複数のポリゴンに変換します。最終的なジオメトリを得るには、これらをDissolverでディゾルブします。

FME 2014 SP4 build 14433

=====
Really interesting :)
実におもしろい (^^)

=====
2014-11-10: I didn't think that "dissolving" can be performed with the Python FME Objects API. Probably that's true but I noticed that the required result can be generated by "buffering with specifying 0 as amount" against an aggregate geometry, in this case.
「ディゾルブ(dissolving)」は Python FME Objects API でできるとは思っていませんでした。おそらくそれはその通りでしょうが、このケースでは、集約ジオメトリ(aggregate)に対する「バッファ量0のバッファリング(buffering)」によって求められる結果が得られることに気がつきました。
-----
# Script Example for PythonCaller
# Creates a Tapered Buffer Area for a Line.
# Assumes input feature geometry is a Line.
import fmeobjects, math

class TaperLineBufferer(object):
def input(self, feature):
rb = float(feature.getAttribute('_beg_width')) * 0.5
re = float(feature.getAttribute('_end_width')) * 0.5
k = (re - rb) / float(feature.performFunction('@Length()'))
coords = feature.getAllCoordinates()
hullCollection = fmeobjects.FMEAggregate()
x0, y0, measure = coords[0][0], coords[0][1], 0.0
g0 = fmeobjects.FMEEllipse(fmeobjects.FMEPoint(x0, y0), rb, rb, 0.0, 0)
for coord in coords[1:]:
x1, y1 = coord[0], coord[1]
measure += math.hypot(x1 - x0, y1 - y0)
rm = rb + k * measure
g1 = fmeobjects.FMEEllipse(fmeobjects.FMEPoint(x1, y1), rm, rm, 0.0, 0)
geom = fmeobjects.FMEAggregate()
geom.appendPart(g0)
geom.appendPart(g1)
hull = fmeobjects.FMEFeature()
hull.setGeometry(geom)
hull.performFunction('@ConvexHull()')
hullCollection.appendPart(hull.getGeometry())
x0, y0, g0 = x1, y1, g1
feature.setGeometry(hullCollection)
feature.buffer(0.0, 0.0) # dissolve parts
self.pyoutput(feature)
-----
Like a strange living thing...
ヘンな生き物のような...

I published a custom transformer named "TaperLineBufferer" in the FME Store, which has been implemented based on the Python script.
このPythonスクリプトに基づいて実装した"TaperLineBufferer"という名前のカスタムトランスフォーマーを FME Store で公開しました。

2014-11-02

Split String based on Byte Numbers in Specific Encoding

特定のエンコーディングにおけるバイト数による文字列の分割

FME treats the number of characters as the length of a character string, will not consider the byte number of multi-byte characters which are used in Japanese locale. I believe that it's reasonable and convenient in almost all the cases, since you don't need to think of difference in the number of bytes among different encoding.
FMEは文字数を文字列の長さとして扱い、日本語のロケールで使用されるマルチバイト文字のバイト数を考慮しません。異なるエンコーディングにおけるバイト数の違いを考えなくて良いので、ほとんどの場合これは合理的でかつ便利だと思います。

However, I sometimes need to split a string based on byte numbers counted in a specific encoding when reading datasets in some legacy formats that determine column widths in a record based on byte numbers. Not only Japanese, but also Korean and Chinese users may encounter a similar situation.
しかし時には、レコードの各列幅をバイト数で規定しているレガシーなフォーマットのデータセットを読むような場合に、特定のエンコーディングでのバイト数に基づいて文字列を分割する必要があります。日本だけでなく、韓国や中国のユーザーも似たような状況に遭遇するかも知れません。

The Column Aligned Text (CAT) reader can be used to read a dataset in fixed-length columns format. But it seems to always consider one character as one byte, so it's useless for a dataset that could contain multi-byte characters, unfortunately.
How can you split a string based on byte number of each column?
Column Aligned Text (CAT) リーダーは、固定列幅のフォーマットのデータセットを読むのに使えますが、これは常に1文字を1バイトとみなすようなので、残念ながらマルチバイト文字を含む可能性があるデータセットには使えません。
各列のバイト数に基づいて文字列を分割するにはどうするか？

=====
2014-11-03: The above description about the CAT reader is not exact. If there were a line consisting of only ASCII characters in the source text file, the proper field boundaries could be set to the CAT reader in the parameters dialog, and the reader could read the data as expected. Otherwise, the field boundaries cannot be set properly to the reader. That's a limitation of the current CAT reader for reading data including multi-byte characters.
CATリーダーについての上記の説明は正確ではありません。ソーステキストファイルにASCII文字のみで構成されている行があれば、パラメーター設定ダイアログボックスでCATリーダーに列の境界を適切に設定することができ、また、リーダーはデータを期待どおりに読み込むことができますが、そうでない場合に、列の境界をリーダーに正しく設定することができません。それがマルチバイト文字を含むデータ読込における現在のCATリーダーの制約です。
=====

Tcl scripting is one of quick ways. For example, a TclCaller with this script splits a string into tokens of 4 bytes and 6 bytes counted in cp932 encoding. If the "src" holds "あいうえお", the new attributes "col0" and "col1" will store "あい" and "うえお", since a Japanese character is represented by 2 bytes in cp932. cp932 is the default encoding of Japanese Windows.
Tclスクリプトは簡単な方法のひとつです。例えば、次のスクリプトを設定したTclCallerは、ある文字列を、cp932エンコーディングでの4バイトと6バイトのトークンに分割します。日本語の文字はcp932では2バイトで表現されるので、"src" が「あいうえお」を保持していれば、新しい属性 "col0" と "col1" はそれぞれ、「あい」と「うえお」を格納することになります。cp932は日本語版Windowsのデフォルトのエンコーディングです。
-----
proc byteSplit {} {
set src [encoding convertto cp932 [FME_GetAttribute "src"]]
FME_SetAttribute "col0" [encoding convertfrom cp932 [string range $src 0 3]]
FME_SetAttribute "col1" [encoding convertfrom cp932 [string range $src 4 9]]
}
-----

Based on this idea, I created a custom transformer named MbStringByteSplitter and published it in FME Store. Find it in this folder.
Transformer Gallery / FME Store / Pragmatica
これを基本にして MbStringByteSplitter というカスタムトランスフォーマーを作成し、FME Storeで公開しました。このフォルダ内を見てください。
Transformer Galley / FME Store / Pragmatica

I think the transformer works fine in any locales theoretically, but I cannot test in locales other than Japanese. If you find some issues in your locale, please let me know it.
理屈ではどんなロケールでも動作すると思いますが、日本語以外のロケールでテストすることはできません。もし何か問題を見つけたときはお知らせください。

FME 2014 SP4 build 14433

=====
2014-11-07: I thought the string splitting by byte count was a special requirement for a few specific cultures such as Japanese, Korean, and Chinese. So I've never asked any improvement on this issue to Safe (There are many things that have to be improved before this!).
However, surprisingly, this article got many accesses from around the world including Europe and America in a few days. Perhaps there are similar requirements in alphabetic cultures - Latin, Cyrillic, Greece, Arabic, Hebrew etc.?
If so, the CAT reader (Field Boundaries setting) will have to be improved as soon as possible, and also adding some new Transformers that process a character string based on byte count in a specified encoding might be helpful to many users.
How do you feel?
バイト数による文字列分割は日本語、韓国語、中国語のような特定の文化における特殊な要求だと思っていたので、この件についてSafe社に何らかの改良を依頼したことはありません（それよりも前に改善されるべきことがたくさんある!）。
しかし意外なことに、この記事には欧米を含む世界中から数日で多くのアクセスがありました。もしかしてラテン、キリル、ギリシャ、アラビア、ヘブライなどのアルファベット文化圏でも似たような要求があるのでしょうか？
もしそうならば、CATリーダー（Field Boundaries 設定）はできるだけ早く改良されるべきだし、指定したエンコーディングでのバイト数によって文字列を処理する新しいトランスフォーマーを追加することは多くのユーザーにとって有用かも知れません。
どう思います？