Show simple item record

dc.contributor.author
Hediger, Simon
dc.contributor.author
Michel, Loris
dc.contributor.author
Näf, Jeffrey
dc.date.accessioned
2022-03-09T12:09:37Z
dc.date.available
2022-02-05T14:16:57Z
dc.date.available
2022-03-08T14:57:45Z
dc.date.available
2022-03-09T12:09:37Z
dc.date.issued
2022-06
dc.identifier.issn
0167-9473
dc.identifier.other
10.1016/j.csda.2022.107435
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/530959
dc.identifier.doi
10.3929/ethz-b-000530959
dc.description.abstract
Following the line of classification-based two-sample testing, tests based on the Random Forest classifier are proposed. The developed tests are easy to use, require almost no tuning, and are applicable for any distribution on Rd. Furthermore, the built-in variable importance measure of the Random Forest gives potential insights into which variables make out the difference in distribution. An asymptotic power analysis for the proposed tests is conducted. Finally, two real-world applications illustrate the usefulness of the introduced methodology. To simplify the use of the method, the R-package “hypoRF” is provided.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Elsevier
en_US
dc.rights.uri
http://creativecommons.org/licenses/by/4.0/
dc.subject
Random forest
en_US
dc.subject
Distribution testing
en_US
dc.subject
Classification
en_US
dc.subject
Kernel two-sample test
en_US
dc.subject
MMD
en_US
dc.subject
Total variation distance
en_US
dc.subject
U-statistics
en_US
dc.title
On the use of random forest for two-sample testing
en_US
dc.type
Journal Article
dc.rights.license
Creative Commons Attribution 4.0 International
dc.date.published
2022-01-24
ethz.journal.title
Computational Statistics & Data Analysis
ethz.journal.volume
170
en_US
ethz.journal.abbreviated
Comput. stat. data anal.
ethz.pages.start
107435
en_US
ethz.size
34 p.
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.identifier.wos
ethz.identifier.scopus
ethz.publication.place
New York, NY
en_US
ethz.publication.status
published
en_US
ethz.date.deposited
2022-02-05T14:17:03Z
ethz.source
SCOPUS
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2022-03-09T12:09:43Z
ethz.rosetta.lastUpdated
2023-02-07T00:20:56Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=On%20the%20use%20of%20random%20forest%20for%20two-sample%20testing&rft.jtitle=Computational%20Statistics%20&%20Data%20Analysis&rft.date=2022-06&rft.volume=170&rft.spage=107435&rft.issn=0167-9473&rft.au=Hediger,%20Simon&Michel,%20Loris&N%C3%A4f,%20Jeffrey&rft.genre=article&rft_id=info:doi/10.1016/j.csda.2022.107435&
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record